graphify

name: graphify description: "any input (code, docs, papers, images) → knowledge graph → clustered communities → HTML + JSON + audit report. Use when user asks any question about a codebase, project content, architecture, or file relationships — especially if graphify-out/ exists. Provides persistent graph with god nodes, community detection, and BFS/DFS query tools." trigger: /graphify

/graphify

Turn any folder of files into a navigable knowledge graph with community detection, an honest audit trail, and three outputs: interactive HTML, GraphRAG-ready JSON, and a plain-language GRAPH_REPORT.md.

Usage

/graphify                                             # full pipeline on current directory → Obsidian vault
/graphify <path>                                      # full pipeline on specific path
/graphify <path> --mode deep                          # thorough extraction, richer INFERRED edges
/graphify <path> --update                             # incremental - re-extract only new/changed files
/graphify <path> --directed                            # build directed graph (preserves edge direction: source→target)
/graphify <path> --cluster-only                       # rerun clustering on existing graph
/graphify <path> --no-viz                             # skip visualization, just report + JSON
/graphify <path> --html                               # (HTML is generated by default - this flag is a no-op)
/graphify <path> --svg                                # also export graph.svg (embeds in Notion, GitHub)
/graphify <path> --graphml                            # export graph.graphml (Gephi, yEd)
/graphify <path> --neo4j                              # generate graphify-out/cypher.txt for Neo4j
/graphify <path> --neo4j-push bolt://localhost:7687   # push directly to Neo4j
/graphify <path> --wiki                               # build agent-crawlable wiki (index.md + one article per community)
/graphify <path> --obsidian --obsidian-dir ~/vaults/my-project  # write vault to custom path (e.g. existing vault)
/graphify <path> --mcp                                # start MCP stdio server for agent access
/graphify <path> --watch                              # watch folder, auto-rebuild on code changes (no LLM needed)
/graphify add <url>                                   # fetch URL, save to ./raw, update graph
/graphify add <url> --author "Name"                   # tag who wrote it
/graphify add <url> --contributor "Name"              # tag who added it to the corpus
/graphify query "<question>"                          # BFS traversal - broad context
/graphify query "<question>" --dfs                    # DFS - trace a specific path
/graphify query "<question>" --budget 1500            # cap answer at N tokens
/graphify path "AuthModule" "Database"                # shortest path between two concepts
/graphify explain "SwinTransformer"                   # plain-language explanation of a node

What graphify is for

graphify is built around Andrej Karpathy's /raw folder workflow: drop anything into a folder — papers, tweets, screenshots, code, notes — and get a structured knowledge graph that shows you what you didn't know was connected.

Three things it does that your AI assistant alone cannot:

Persistent graph — relationships are stored in graphify-out/graph.json and survive across sessions. Ask questions weeks later without re-reading everything.
Honest audit trail — every edge is tagged EXTRACTED, INFERRED, or AMBIGUOUS. You know what was found vs invented.
Cross-document surprise — community detection finds connections between concepts in different files that you would never think to ask about directly.

Use it for:

A codebase you're new to (understand architecture before touching anything)
A reading list (papers + tweets + notes → one navigable graph)
A research corpus (citation graph + concept graph in one)
Your personal /raw folder (drop everything in, let it grow, query it)

What You Must Do When Invoked

If the user invoked /graphify --help or /graphify -h (with no other arguments), print the contents of the ## Usage section above verbatim and stop. Do not run any commands, do not detect files, do not default the path to .. Just print the Usage block and return.

If no path was given, use . (current directory). Do not ask the user for a path.

Mode dispatcher — pick the right reference before proceeding

Before running the full pipeline below, check the command for these flags / subcommands. If any match, read the listed references/<file>.md FIRST, then either return here for the remaining pipeline steps or stay in that reference until done.

User invocation	Reference to read	Pipeline impact
`--update`	`references/update.md`	incremental, then Steps 4–9 here
`--cluster-only`	`references/cluster-only.md`	skip Steps 1–3, then Steps 5–9 here
`query "..."` / `path A B` / `explain X`	`references/query.md`	query existing graph only, no pipeline
`add <url>`	`references/add-watch-hooks.md`	fetch + run `--update`
`--watch`	`references/add-watch-hooks.md`	background watcher, no pipeline
`hook install/uninstall/status`	`references/add-watch-hooks.md`	git hook setup
`claude install/uninstall`	`references/add-watch-hooks.md`	CLAUDE.md integration
`--svg` / `--graphml` / `--neo4j` / `--neo4j-push` / `--mcp`	`references/exports.md`	additive after Step 6
Step 2 detects `video` files > 0	`references/transcribe.md`	runs as Step 2.5
PowerShell scrolling / extraction errors	`references/troubleshooting.md`	post-failure recovery

Default (no special flags): proceed to Step 1 below.

Follow these steps in order. Do not skip steps.

Step 1 — Ensure graphify is installed

# Detect Python and install graphify if needed
@'
import graphify
'@ | Out-File -FilePath .graphify_step_1_ensure_graphify_is_installed_1.py -Encoding utf8
python .graphify_step_1_ensure_graphify_is_installed_1.py 2>$null
Remove-Item -ErrorAction SilentlyContinue .graphify_step_1_ensure_graphify_is_installed_1.py
if ($LASTEXITCODE -ne 0) {
    if (Get-Command uv -ErrorAction SilentlyContinue) {
        uv tool install --upgrade graphifyy -q 2>&1 | Select-Object -Last 3
    } else {
        pip install graphifyy -q 2>&1 | Select-Object -Last 3
    }
}
# Write interpreter path for all subsequent steps
@'
import sys; open('.graphify_python', 'w', encoding='utf-8').write(sys.executable)
'@ | Out-File -FilePath .graphify_step_1_ensure_graphify_is_installed_2.py -Encoding utf8
python .graphify_step_1_ensure_graphify_is_installed_2.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_1_ensure_graphify_is_installed_2.py

If the import succeeds, print nothing and move straight to Step 2.

Step 2 — Detect files

@'
import json
from graphify.detect import detect
from pathlib import Path
result = detect(Path('INPUT_PATH'))
print(json.dumps(result, ensure_ascii=False))
'@ | Out-File -FilePath .graphify_step_2_detect_files_3.py -Encoding utf8
python .graphify_step_2_detect_files_3.py > .graphify_detect.json
Remove-Item -ErrorAction SilentlyContinue .graphify_step_2_detect_files_3.py

Replace INPUT_PATH with the actual path the user provided. Do NOT cat or print the JSON — read it silently and present a clean summary instead:

Corpus: X files · ~Y words
  code:     N files (.py .ts .go ...)
  docs:     N files (.md .txt ...)
  papers:   N files (.pdf ...)
  images:   N files
  video:    N files (.mp4 .mp3 ...)

Omit any category with 0 files from the summary.

Then act on it:

If total_files is 0: stop with "No supported files found in [path]."
If skipped_sensitive is non-empty: mention file count skipped, not the file names.
If total_words > 2,000,000 OR total_files > 200: show the warning and the top 5 subdirectories by file count, then ask which subfolder to run on. Wait for the user's answer before proceeding.
If video files were detected: read references/transcribe.md and run Step 2.5 before continuing.
Otherwise: proceed directly to Step 3.

Step 3 — Extract entities and relationships

Before starting: note whether --mode deep was given. You must pass DEEP_MODE=true to every subagent in Step B2 if it was. Track this from the original invocation — do not lose it.

This step has two parts: structural extraction (deterministic, free) and semantic extraction (your AI model, costs tokens).

Run Part A (AST) and Part B (semantic) in parallel. Dispatch all semantic subagents AND start AST extraction in the same message. Both can run simultaneously since they operate on different file types. Merge results in Part C.

Parallelizing AST + semantic saves 5–15s on large corpora. AST is deterministic and fast; start it while subagents are processing docs/papers.

Part A — Structural extraction for code files

For any code files detected, run AST extraction in parallel with Part B subagents:

@'
import json
from graphify.extract import collect_files, extract
from pathlib import Path


def main():
    code_files = []
    detect = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))
    for f in detect.get('files', {}).get('code', []):
        code_files.extend(collect_files(Path(f)) if Path(f).is_dir() else [Path(f)])

    if code_files:
        result = extract(code_files)
        Path('.graphify_ast.json').write_text(json.dumps(result, indent=2, ensure_ascii=False), encoding="utf-8")
        print(f'AST: {len(result["nodes"])} nodes, {len(result["edges"])} edges')
    else:
        Path('.graphify_ast.json').write_text(json.dumps({'nodes':[],'edges':[],'input_tokens':0,'output_tokens':0}, ensure_ascii=False), encoding="utf-8")
        print('No code files - skipping AST extraction')


# Windows-spawn ProcessPoolExecutor (used inside extract()) re-imports this
# script in each worker; without an `if __name__ == "__main__":` guard the
# pool would recursively spawn itself. graphify v0.7.11+ falls back to
# sequential extraction if the pool dies, but the guard keeps multi-core
# extraction working on Windows.
if __name__ == '__main__':
    main()
'@ | Out-File -FilePath .graphify_step_ast.py -Encoding utf8
python .graphify_step_ast.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_ast.py

Part B — Semantic extraction (parallel subagents)

Fast path: If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code — there is nothing for semantic subagents to do.

MANDATORY: You MUST use the Agent tool here. Reading files yourself one-by-one is forbidden — it is 5–10x slower. If you do not use the Agent tool you are doing this wrong.

Before dispatching subagents, print a timing estimate:

Load total_words and file counts from .graphify_detect.json
Estimate agents needed: ceil(uncached_non_code_files / 22) (chunk size is 20–25)
Estimate time: ~45s per agent batch (they run in parallel, so total ≈ 45s × ceil(agents/parallel_limit))
Print: "Semantic extraction: ~N files → X agents, estimated ~Ys"

Step B0 — Check extraction cache first

Before dispatching any subagents, check which files already have cached extraction results:

@'
import json
from graphify.cache import check_semantic_cache
from pathlib import Path

detect = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))
all_files = [f for files in detect['files'].values() for f in files]

cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(all_files)

if cached_nodes or cached_edges or cached_hyperedges:
    Path('.graphify_cached.json').write_text(json.dumps({'nodes': cached_nodes, 'edges': cached_edges, 'hyperedges': cached_hyperedges}, ensure_ascii=False), encoding="utf-8")
Path('.graphify_uncached.txt').write_text('\n'.join(uncached), encoding="utf-8")
print(f'Cache: {len(all_files)-len(uncached)} files hit, {len(uncached)} files need extraction')
'@ | Out-File -FilePath .graphify_step_3_extract_entities_and_relations_5.py -Encoding utf8
python .graphify_step_3_extract_entities_and_relations_5.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_3_extract_entities_and_relations_5.py

Only dispatch subagents for files listed in .graphify_uncached.txt. If all files are cached, skip to Part C directly.

Step B1 — Split into chunks

Load files from .graphify_uncached.txt. Split into chunks of 20–25 files each. Each image gets its own chunk (vision needs separate context).

Step B2 — Dispatch ALL subagents in a single message

Call the Agent tool multiple times IN THE SAME RESPONSE — one call per chunk. This is the only way they run in parallel. If you make one Agent call, wait, then make another, you are doing it sequentially and defeating the purpose.

Concrete example for 3 chunks:

[Agent tool call 1: files 1-15]
[Agent tool call 2: files 16-30]
[Agent tool call 3: files 31-45]

All three in one message. Not three separate messages.

The exact subagent prompt (rules, confidence rubric, output JSON schema) is in references/extraction-prompt.md. Read that file and use it verbatim, substituting FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, and DEEP_MODE.

Step B3 — Collect, cache, and merge

Wait for all subagents. For each result:

Check that graphify-out/.graphify_chunk_NN.json exists on disk — this is the success signal
If the file exists and contains valid JSON with nodes and edges, include it and save to cache
If the file is missing, the subagent was likely dispatched as read-only (Explore type) — print a warning: "chunk N missing from disk — subagent may have been read-only. Re-run with general-purpose agent." Do not silently skip.
If a subagent failed or returned invalid JSON, print a warning and skip that chunk — do not abort

If more than half the chunks failed or are missing, stop and tell the user to re-run and ensure subagent_type="general-purpose" is used.

Merge all chunk files into .graphify_semantic_new.json. After each Agent call completes, read the real token counts from the Agent tool result's usage field and write them back into the chunk JSON before merging — the chunk JSON itself always has placeholder zeros. Then run:

$(cat graphify-out/.graphify_python) -c "
import json, glob
from pathlib import Path

chunks = sorted(glob.glob('graphify-out/.graphify_chunk_*.json'))
all_nodes, all_edges, all_hyperedges = [], [], []
total_in, total_out = 0, 0
for c in chunks:
    d = json.loads(Path(c).read_text(encoding=\"utf-8\"))
    all_nodes += d.get('nodes', [])
    all_edges += d.get('edges', [])
    all_hyperedges += d.get('hyperedges', [])
    total_in += d.get('input_tokens', 0)
    total_out += d.get('output_tokens', 0)
Path('graphify-out/.graphify_semantic_new.json').write_text(json.dumps({
    'nodes': all_nodes, 'edges': all_edges, 'hyperedges': all_hyperedges,
    'input_tokens': total_in, 'output_tokens': total_out,
}, indent=2, ensure_ascii=False), encoding=\"utf-8\")
print(f'Merged {len(chunks)} chunks: {total_in:,} in / {total_out:,} out tokens')
"

Save new results to cache:

@'
import json
from graphify.cache import save_semantic_cache
from pathlib import Path

new = json.loads(Path('.graphify_semantic_new.json').read_text(encoding="utf-8")) if Path('.graphify_semantic_new.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}
saved = save_semantic_cache(new.get('nodes', []), new.get('edges', []), new.get('hyperedges', []))
print(f'Cached {saved} files')
'@ | Out-File -FilePath .graphify_step_3_extract_entities_and_relations_6.py -Encoding utf8
python .graphify_step_3_extract_entities_and_relations_6.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_3_extract_entities_and_relations_6.py

Merge cached + new results into .graphify_semantic.json:

@'
import json
from pathlib import Path

cached = json.loads(Path('.graphify_cached.json').read_text(encoding="utf-8")) if Path('.graphify_cached.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}
new = json.loads(Path('.graphify_semantic_new.json').read_text(encoding="utf-8")) if Path('.graphify_semantic_new.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}

all_nodes = cached['nodes'] + new.get('nodes', [])
all_edges = cached['edges'] + new.get('edges', [])
all_hyperedges = cached.get('hyperedges', []) + new.get('hyperedges', [])
seen = set()
deduped = []
for n in all_nodes:
    if n['id'] not in seen:
        seen.add(n['id'])
        deduped.append(n)

merged = {
    'nodes': deduped,
    'edges': all_edges,
    'hyperedges': all_hyperedges,
    'input_tokens': new.get('input_tokens', 0),
    'output_tokens': new.get('output_tokens', 0),
}
Path('.graphify_semantic.json').write_text(json.dumps(merged, indent=2, ensure_ascii=False), encoding="utf-8")
print(f'Extraction complete - {len(deduped)} nodes, {len(all_edges)} edges ({len(cached["nodes"])} from cache, {len(new.get("nodes",[]))} new)')
'@ | Out-File -FilePath .graphify_step_3_extract_entities_and_relations_7.py -Encoding utf8
python .graphify_step_3_extract_entities_and_relations_7.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_3_extract_entities_and_relations_7.py

Clean up temp files: Remove-Item -ErrorAction SilentlyContinue .graphify_cached.json, .graphify_uncached.txt, .graphify_semantic_new.json

Part C — Merge AST + semantic into final extraction

@'
import sys, json
from pathlib import Path

ast = json.loads(Path('.graphify_ast.json').read_text(encoding="utf-8"))
sem = json.loads(Path('.graphify_semantic.json').read_text(encoding="utf-8"))

# Merge: AST nodes first, semantic nodes deduplicated by id
seen = {n['id'] for n in ast['nodes']}
merged_nodes = list(ast['nodes'])
for n in sem['nodes']:
    if n['id'] not in seen:
        merged_nodes.append(n)
        seen.add(n['id'])

merged_edges = ast['edges'] + sem['edges']
merged_hyperedges = sem.get('hyperedges', [])
merged = {
    'nodes': merged_nodes,
    'edges': merged_edges,
    'hyperedges': merged_hyperedges,
    'input_tokens': sem.get('input_tokens', 0),
    'output_tokens': sem.get('output_tokens', 0),
}
Path('.graphify_extract.json').write_text(json.dumps(merged, indent=2, ensure_ascii=False), encoding="utf-8")
total = len(merged_nodes)
edges = len(merged_edges)
print(f'Merged: {total} nodes, {edges} edges ({len(ast["nodes"])} AST + {len(sem["nodes"])} semantic)')
'@ | Out-File -FilePath .graphify_step_3_extract_entities_and_relations_8.py -Encoding utf8
python .graphify_step_3_extract_entities_and_relations_8.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_3_extract_entities_and_relations_8.py

Step 4 — Build graph, cluster, analyze, generate outputs

New-Item -ItemType Directory -Force -Path graphify-out | Out-Null
@'
import sys, json
from graphify.build import build_from_json
from graphify.cluster import cluster, score_all
from graphify.analyze import god_nodes, surprising_connections, suggest_questions
from graphify.report import generate
from graphify.export import to_json
from pathlib import Path

extraction = json.loads(Path('.graphify_extract.json').read_text(encoding="utf-8"))
detection  = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))

G = build_from_json(extraction)
communities = cluster(G)
cohesion = score_all(G, communities)
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
gods = god_nodes(G)
surprises = surprising_connections(G, communities)
labels = {cid: 'Community ' + str(cid) for cid in communities}
# Placeholder questions - regenerated with real labels in Step 5
questions = suggest_questions(G, communities, labels)

report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, 'INPUT_PATH', suggested_questions=questions)
Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding="utf-8")
to_json(G, communities, 'graphify-out/graph.json')

analysis = {
    'communities': {str(k): v for k, v in communities.items()},
    'cohesion': {str(k): v for k, v in cohesion.items()},
    'gods': gods,
    'surprises': surprises,
    'questions': questions,
}
Path('.graphify_analysis.json').write_text(json.dumps(analysis, indent=2, ensure_ascii=False), encoding="utf-8")
if G.number_of_nodes() == 0:
    print('ERROR: Graph is empty - extraction produced no nodes.')
    print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
    raise SystemExit(1)
print(f'Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges, {len(communities)} communities')
'@ | Out-File -FilePath .graphify_step_4_build_graph_cluster_analyze_ge_9.py -Encoding utf8
python .graphify_step_4_build_graph_cluster_analyze_ge_9.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_4_build_graph_cluster_analyze_ge_9.py

If this step prints ERROR: Graph is empty, stop and tell the user what happened — do not proceed to labeling or visualization. See references/troubleshooting.md.

Replace INPUT_PATH with the actual path.

Step 5 — Label communities

Read .graphify_analysis.json. For each community key, look at its node labels and write a 2–5 word plain-language name (e.g. "Attention Mechanism", "Training Pipeline", "Data Loading").

Then regenerate the report and save the labels for the visualizer:

@'
import sys, json
from graphify.build import build_from_json
from graphify.cluster import score_all
from graphify.analyze import god_nodes, surprising_connections, suggest_questions
from graphify.report import generate
from pathlib import Path

extraction = json.loads(Path('.graphify_extract.json').read_text(encoding="utf-8"))
detection  = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))
analysis   = json.loads(Path('.graphify_analysis.json').read_text(encoding="utf-8"))

G = build_from_json(extraction)
communities = {int(k): v for k, v in analysis['communities'].items()}
cohesion = {int(k): v for k, v in analysis['cohesion'].items()}
tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}

# LABELS - replace these with the names you chose above
labels = LABELS_DICT

# Regenerate questions with real community labels (labels affect question phrasing)
questions = suggest_questions(G, communities, labels)

report = generate(G, communities, cohesion, labels, analysis['gods'], analysis['surprises'], detection, tokens, 'INPUT_PATH', suggested_questions=questions)
Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding="utf-8")
Path('.graphify_labels.json').write_text(json.dumps({str(k): v for k, v in labels.items()}, ensure_ascii=False), encoding="utf-8")
print('Report updated with community labels')
'@ | Out-File -FilePath .graphify_step_5_label_communities_10.py -Encoding utf8
python .graphify_step_5_label_communities_10.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_5_label_communities_10.py

Replace LABELS_DICT with the actual dict you constructed (e.g. {0: "Attention Mechanism", 1: "Training Pipeline"}). Replace INPUT_PATH with the actual path.

Step 6 — Generate Obsidian vault (opt-in) + HTML

Generate HTML always (unless --no-viz). Obsidian vault only if --obsidian was explicitly given — skip it otherwise, it generates one file per node.

If --obsidian was given:

If --obsidian-dir <path> was also given, use that path as the vault directory. Otherwise default to graphify-out/obsidian.

@'
import sys, json
from graphify.build import build_from_json
from graphify.export import to_obsidian, to_canvas
from pathlib import Path

extraction = json.loads(Path('.graphify_extract.json').read_text(encoding="utf-8"))
analysis   = json.loads(Path('.graphify_analysis.json').read_text(encoding="utf-8"))
labels_raw = json.loads(Path('.graphify_labels.json').read_text(encoding="utf-8")) if Path('.graphify_labels.json').exists() else {}

G = build_from_json(extraction)
communities = {int(k): v for k, v in analysis['communities'].items()}
cohesion = {int(k): v for k, v in analysis['cohesion'].items()}
labels = {int(k): v for k, v in labels_raw.items()}

obsidian_dir = 'OBSIDIAN_DIR'  # replace with --obsidian-dir value, or 'graphify-out/obsidian' if not given

n = to_obsidian(G, communities, obsidian_dir, community_labels=labels or None, cohesion=cohesion)
print(f'Obsidian vault: {n} notes in {obsidian_dir}/')

to_canvas(G, communities, f'{obsidian_dir}/graph.canvas', community_labels=labels or None)
print(f'Canvas: {obsidian_dir}/graph.canvas - open in Obsidian for structured community layout')
print()
print(f'Open {obsidian_dir}/ as a vault in Obsidian.')
print('  Graph view   - nodes colored by community (set automatically)')
print('  graph.canvas - structured layout with communities as groups')
print('  _COMMUNITY_* - overview notes with cohesion scores and dataview queries')
'@ | Out-File -FilePath .graphify_step_6_generate_obsidian_vault_opt_in_11.py -Encoding utf8
python .graphify_step_6_generate_obsidian_vault_opt_in_11.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_6_generate_obsidian_vault_opt_in_11.py

Generate the HTML graph (always, unless --no-viz):

@'
import sys, json
from graphify.build import build_from_json
from graphify.export import to_html
from pathlib import Path

extraction = json.loads(Path('.graphify_extract.json').read_text(encoding="utf-8"))
analysis   = json.loads(Path('.graphify_analysis.json').read_text(encoding="utf-8"))
labels_raw = json.loads(Path('.graphify_labels.json').read_text(encoding="utf-8")) if Path('.graphify_labels.json').exists() else {}

G = build_from_json(extraction)
communities = {int(k): v for k, v in analysis['communities'].items()}
labels = {int(k): v for k, v in labels_raw.items()}

if G.number_of_nodes() > 5000:
    print(f'Graph has {G.number_of_nodes()} nodes - too large for HTML viz. Use Obsidian vault instead.')
else:
    to_html(G, communities, 'graphify-out/graph.html', community_labels=labels or None)
    print('graph.html written - open in any browser, no server needed')
'@ | Out-File -FilePath .graphify_step_6_generate_obsidian_vault_opt_in_12.py -Encoding utf8
python .graphify_step_6_generate_obsidian_vault_opt_in_12.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_6_generate_obsidian_vault_opt_in_12.py

Step 7 — Optional alternate exports

If the user passed any of --svg, --graphml, --neo4j, --neo4j-push, or --mcp, read references/exports.md and run the matching block(s) before continuing to Step 8. Skip this step entirely if none of those flags were given.

Step 8 — Token reduction benchmark (only if `total_words > 5000`)

If total_words from .graphify_detect.json is greater than 5,000, run:

@'
import json
from graphify.benchmark import run_benchmark, print_benchmark
from pathlib import Path

detection = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))
result = run_benchmark('graphify-out/graph.json', corpus_words=detection['total_words'])
print_benchmark(result)
'@ | Out-File -FilePath .graphify_step_8_token_reduction_benchmark_only_17.py -Encoding utf8
python .graphify_step_8_token_reduction_benchmark_only_17.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_8_token_reduction_benchmark_only_17.py

Print the output directly in chat. If total_words <= 5000, skip silently — the graph value is structural clarity, not token compression, for small corpora.

Step 9 — Save manifest, update cost tracker, clean up, and report

@'
import json
from pathlib import Path
from datetime import datetime, timezone
from graphify.detect import save_manifest

# Save manifest for --update
detect = json.loads(Path('.graphify_detect.json').read_text(encoding="utf-8"))
save_manifest(detect['files'])

# Update cumulative cost tracker
extract = json.loads(Path('.graphify_extract.json').read_text(encoding="utf-8"))
input_tok = extract.get('input_tokens', 0)
output_tok = extract.get('output_tokens', 0)

cost_path = Path('graphify-out/cost.json')
if cost_path.exists():
    cost = json.loads(cost_path.read_text(encoding="utf-8"))
else:
    cost = {'runs': [], 'total_input_tokens': 0, 'total_output_tokens': 0}

cost['runs'].append({
    'date': datetime.now(timezone.utc).isoformat(),
    'input_tokens': input_tok,
    'output_tokens': output_tok,
    'files': detect.get('total_files', 0),
})
cost['total_input_tokens'] += input_tok
cost['total_output_tokens'] += output_tok
cost_path.write_text(json.dumps(cost, indent=2, ensure_ascii=False), encoding="utf-8")

print(f'This run: {input_tok:,} input tokens, {output_tok:,} output tokens')
print(f'All time: {cost["total_input_tokens"]:,} input, {cost["total_output_tokens"]:,} output ({len(cost["runs"])} runs)')
'@ | Out-File -FilePath .graphify_step_9_save_manifest_update_cost_trac_18.py -Encoding utf8
python .graphify_step_9_save_manifest_update_cost_trac_18.py
Remove-Item -ErrorAction SilentlyContinue .graphify_step_9_save_manifest_update_cost_trac_18.py
Remove-Item -ErrorAction SilentlyContinue .graphify_detect.json, .graphify_extract.json, .graphify_ast.json, .graphify_semantic.json, .graphify_analysis.json, .graphify_labels.json
Remove-Item -ErrorAction SilentlyContinue graphify-out/.needs_update

Tell the user (omit the obsidian line unless --obsidian was given):

Graph complete. Outputs in PATH_TO_DIR/graphify-out/

  graph.html            - interactive graph, open in browser
  GRAPH_REPORT.md       - audit report
  graph.json            - raw graph data
  obsidian/             - Obsidian vault (only if --obsidian was given)

If graphify saved you time, consider supporting it: https://github.com/sponsors/safishamsi

Replace PATH_TO_DIR with the actual absolute path of the directory that was processed.

Then paste these sections from GRAPH_REPORT.md directly into the chat:

God Nodes
Surprising Connections
Suggested Questions

Do NOT paste the full report — just those three sections. Keep it concise.

Then immediately offer to explore. Pick the single most interesting suggested question from the report — the one that crosses the most community boundaries or has the most surprising bridge node — and ask:

"The most interesting question this graph can answer: [question]. Want me to trace it?"

If the user says yes, run /graphify query "[question]" on the graph (see references/query.md) and walk them through the answer using the graph structure — which nodes connect, which community boundaries get crossed, what the path reveals. Keep going as long as they want to explore. Each answer should end with a natural follow-up ("this connects to X — want to go deeper?") so the session feels like navigation, not a one-shot report.

The graph is the map. Your job after the pipeline is to be the guide.

Honesty Rules

Never invent an edge. If unsure, use AMBIGUOUS.
Never skip the corpus check warning.
Always show token cost in the report.
Never hide cohesion scores behind symbols — show the raw number.
Never run HTML viz on a graph with more than 5,000 nodes without warning the user.