name: build-knowledge-graph
description: Reverse engineer codebase architecture and build a knowledge graph using gnapsis MCP tools. Use when the user wants to map domains, features, modules, components, and their relationships.
argument-hint:
Build Knowledge Graph
Reverse engineer the architecture of the codebase and build a comprehensive knowledge graph using the gnapsis MCP tools, following a strict derivation order methodology.
If an argument is provided, focus the analysis on that scope or path. Otherwise, analyze the entire codebase.
CRITICAL RULES
- Always use gnapsis MCP tools to register all nodes and relationships in the knowledge graph. The gnapsis graph is the primary output.
- Proceed through phases, summarize at each phase boundary. Complete each phase fully, then summarize what was done before moving to the next.
- Use best judgment for ambiguity: When you encounter ambiguity about business domains, feature boundaries, or architectural decisions, use your best technical judgment and document your reasoning. Add a note in the entity description when a decision was ambiguous.
- ALWAYS run
analyze_documentbefore creating entities for a source file. This gives you the exact LSP symbol names. Never guess symbol names. - Use
ref_type: "code"withlsp_symbolfor source files. Only useref_type: "text"for markdown, docs, and config files. - Be exhaustive, not superficial. Scan every directory, every configuration file, every module entry point. Leave no stone unturned.
GNAPSIS TOOL REFERENCE
Initialization & Discovery
init_project— Initialize the database schema (run once at start)project_overview— Get current ontology: taxonomy (categories by scope), entity hierarchy, statistics
Entity Lifecycle
create_entity(name, description, category_ids, parent_ids, commands)— Create an entity with at least one referenceupdate_entity(entity_id, ...)— Update entity, add/remove references, create relationshipsdelete_entity(entity_id)— Delete entity (must have no children)
Taxonomy
create_category(name, scope)— Create a new category at a scope (if the defaults don't fit)
Querying
get_entity(entity_id)— Full entity details with references and relationshipsfind_entities(scope, category, parent_id)— Filter entities by scope/category/parentsearch(query)— Semantic search across entities and referencesquery(entity_id, semantic_query)— Extract relevant subgraph within token budgetget_document_entities(document_path)— Get all entities referenced in a file
Document Analysis
analyze_document(document_path)— CRITICAL: Discover tracked refs, untracked LSP symbols, and git diffs. Run this BEFORE creating entities for any source file.
Maintenance
alter_references(commands)— Bulk update/delete referencesvalidate_graph()— Check for orphans, cycles, scope violations, missing referencesget_changed_files()— Find files modified since last sync
Entity Commands (used in create_entity/update_entity commands array)
{ type: "add", ref_type: "code", document_path: "...", lsp_symbol: "...", description: "..." }— For source files{ type: "add", ref_type: "text", document_path: "...", start_line: N, end_line: M, description: "..." }— For docs/config{ type: "relate", entity_id: "...", note: "..." }— Create RELATED_TO relationship{ type: "link", entity_id: "...", link_type: "calls|imports|implements|instantiates" }— Code links (Component/Unit only)
Default Categories (from project_overview)
| Scope | Categories |
|---|---|
| Domain | core, infrastructure |
| Feature | functional, technical, non-functional |
| Namespace | module, library |
| Component | struct, trait, enum, class |
| Unit | function, method, constant, field |
MANDATORY WORKFLOW: Creating Entities from Source Files
1. analyze_document(document_path: "src/foo.rs")
→ Returns untracked[] with exact LSP symbol names
2. create_entity(
name: "FooService",
description: "Service for foo operations",
category_ids: ["<struct-category-id>"],
parent_ids: ["<parent-namespace-id>"],
commands: [{
type: "add",
ref_type: "code",
document_path: "src/foo.rs",
lsp_symbol: "FooService", ← MUST match analyze_document output
description: "FooService struct"
}]
)
NEVER skip analyze_document. NEVER guess lsp_symbol names.
PHASE 1: STACK IDENTIFICATION
Before anything else, build a complete understanding of the project's technology stack.
Steps:
- Scan the root directory for configuration files: package.json, Cargo.toml, go.mod, pom.xml, build.gradle, requirements.txt, pyproject.toml, Gemfile, composer.json, Makefile, Dockerfile, docker-compose.yml, CI/CD configs, etc.
- Identify the primary language(s) and their versions.
- Catalog all libraries and frameworks with their roles (web framework, ORM, testing, logging, auth, etc.).
- Map the persistence layer: databases, migration tools, ORM configurations.
- Identify service dependencies: external APIs, microservice connections, message brokers, cloud services.
- Identify infrastructure patterns: containerization, orchestration, CI/CD pipelines, deployment targets.
- Identify architectural style: monolith, microservices, modular monolith, serverless, event-driven, hexagonal, clean architecture, MVC, CQRS, etc.
Gnapsis Setup:
- Run
project_overviewto get the current ontology state and available category IDs. - Note all category IDs — you will need them for every
create_entitycall.
Output:
Summarize findings to the user. Optionally write a docs/stack.md if the user requests documentation.
After summarizing, proceed to Phase 2.
PHASE 2: KNOWLEDGE GRAPH CONSTRUCTION (Gnapsis Derivation Order)
Build the knowledge graph layer by layer following the strict gnapsis scope hierarchy:
Domain → Feature → Namespace → Component → Unit
Scope Definitions:
- Domain: A bounded context representing a major business or technical concern (e.g., Authentication, Graph Abstraction, MCP Server).
- Feature: A cohesive capability within a domain (e.g., within Auth: Login, Registration, Password Reset).
- Namespace: A code module or package that implements part of a feature (e.g.,
services,repositories,mcp::tools). - Component: A concrete code artifact — struct, trait, class, enum (e.g., UserService, AuthTrait).
- Unit: An atomic functional element — function, method, constant, field (e.g.,
validate(),MAX_RETRIES).
Process for EACH Domain:
Step 2.1: Domain Discovery
- Analyze directory structure, namespace patterns, and module boundaries.
- Look for domain indicators: directory names, namespace prefixes, bounded context markers, configuration sections.
- Cross-reference with the stack analysis to understand framework-specific conventions.
- Register each domain using
create_entitywith a Domain-scope category.
Step 2.2: Domain Summary
For each identified domain, briefly log:
- Domain name and description
- Constituent features (enumerated)
- Key entry points and interfaces
- Dependencies on other domains (preliminary)
Then proceed immediately to build the subgraph.
Step 2.3: Full Domain Subgraph Construction
Build the complete subgraph for each domain:
Register Features under the domain. For each feature:
- Use
create_entitywith Feature-scope category andparent_ids: [<domain-id>]. - Identify all code paths that implement this feature.
- Use
Register Namespaces under each feature. For each namespace:
- Use
create_entitywith Namespace-scope category andparent_ids: [<feature-id>]. - Map module boundaries and imports/exports.
- Use
Register Components under each namespace. For each component:
- First run
analyze_document(document_path)to discover LSP symbols. - Use
create_entitywith Component-scope category,parent_ids: [<namespace-id>], and a code reference using the exactlsp_symbolfromanalyze_document.
- First run
Register Units under each component. For each unit:
- Use
create_entitywith Unit-scope category andparent_ids: [<component-id>]. - Reference the exact LSP symbol from
analyze_document.
- Use
Register relationships using
update_entitycommands:{ type: "relate", entity_id: "...", note: "..." }for semantic relationships.{ type: "link", entity_id: "...", link_type: "calls" }for code-level links.
Step 2.4: Repeat for All Domains
Proceed domain by domain. After completing all domains, summarize progress and proceed to Phase 3.
PHASE 3: INTER-DOMAIN RELATIONSHIP ANALYSIS AND OPTIMIZATION
After all domain subgraphs are built:
Step 3.1: Cross-Domain Dependency Scan
- Trace all imports, calls, events, and data flows that cross domain boundaries.
- Identify shared models, common utilities, and cross-cutting concerns.
- Map integration points: API calls between domains, shared database tables, event bus topics.
Step 3.2: Register Inter-Domain Relationships
For each cross-domain relationship, use update_entity with:
{ type: "relate", entity_id: "<target>", note: "description of relationship" }for semantic links.{ type: "link", entity_id: "<target>", link_type: "calls|imports|implements|instantiates" }for code links.
Step 3.3: Coupling and Cohesion Analysis
- Cohesion check: Are all elements within each domain/feature/namespace closely related? Flag low-cohesion areas.
- Coupling check: Are inter-domain dependencies minimal and well-defined? Flag high-coupling areas.
- Identify architectural smells: circular dependencies, god modules, feature envy, shotgun surgery.
Step 3.4: Graph Validation
Run validate_graph() to check for:
- Orphan entities (no parent at non-Domain scope)
- Cycles in BELONGS_TO relationships
- Scope violations (child scope not deeper than parent)
- Entities without references
- Entities without classification
Fix any issues found.
PROGRESS TRACKING
- At the start of each phase, announce what you're about to do.
- After completing each domain, summarize what was registered in the graph (entity counts by scope).
- After completing all phases, provide a final summary.
QUALITY ASSURANCE
Before declaring completion:
- Run
validate_graph()and fix all reported issues. - Verify every source file has been analyzed via
analyze_document. - Cross-check the graph against the directory structure for completeness using
find_entitiesat each scope. - Ensure all relationship types are consistent and accurately labeled.
- Confirm the derivation order (Domain → Feature → Namespace → Component → Unit) is respected throughout.
- Present a final summary with graph statistics from
project_overview.
The final deliverable is the complete knowledge graph in gnapsis. Every node, every edge, every relationship must be registered through the gnapsis MCP tools.