repo-indexer - SKILL.md Agent Skill

name: repo-indexer description: Indexes and documents a codebase for persistent Claude context with minimal token overhead. Use when asked to index a repo, understand a codebase, create CLAUDE.md, set up Claude memory, bootstrap context, onboard to a project, or document codebase for Claude. Triggers on phrases like "index this repo", "understand this codebase", "set up context", "create project memory", "help me onboard". Creates tiered memory system using Claude native memory + minimal boot files + on-demand loading + conversation history as knowledge store. Do NOT use for general code questions, debugging, or tasks unrelated to codebase indexing/documentation. allowed-tools: Bash, Read, Write, Glob, Grep argument-hint: [path] license: MIT metadata: version: 0.0.4 author: JayaShankar Mangina

Repo Indexer

Indexes codebases with minimal context window overhead using tiered memory.

Getting Started

Prerequisites: Python 3.9+. Run from the project root directory.

Memory Architecture

L0: Claude Native Memory  → repo roster, patterns (~100 tokens, auto)
L1: CLAUDE.md             → boot loader only (<500 tokens, auto-load)
L2: .claude/memory/*.md   → deep context (on-demand, explicit load)
L3: Conversation History  → full analysis (searchable, 0 cost until used)

Token budgets: Native Memory ~100–300 | CLAUDE.md <500 | memory/*.md <10,000 total | Past chats: 0 until searched

L2 file loading guide: Architecture decisions → architecture.md | Code style → conventions.md | Unknown terms → glossary.md

Task Progress

Use TodoWrite to track each phase dynamically:

Phase 1: Detect repo type
Phase 2: Analyze codebase (9 areas)
Phase 3: Generate output files
Phase 4: Validate token budgets
Phase 5: Suggest memory update

Workflow

Phase 1: Detect Repo Type

python3 scripts/detect-repo-type.py "$ARGUMENTS"

Phase 2: Index

Analyze systematically:

Config: package.json, pyproject.toml, Cargo.toml, go.mod
Entry points: main files, CLI, server bootstrap
Structure: directory layout to depth 3
Core modules: business logic, services, models
API surface: routes, endpoints, schemas
Data layer: models, migrations, ORM
External deps: third-party integrations
Build/deploy: Dockerfile, CI/CD, Makefile
Tests: structure, fixtures, patterns

Before generating files, present the proposed .claude/ structure to the user for confirmation.

Phase 3: Generate Output

Output to conversation (L3):

Full analysis using format in references/templates.md → "Indexing Output Format". Include ### SEARCH KEYWORDS for retrieval.

Select CLAUDE.md template by repo type:

Use the type-specific variant from references/templates.md:

Monorepo → "CLAUDE.md — Monorepo variant" (packages list, workspace commands)
Library → "CLAUDE.md — Library variant" (public API section, publish commands)
Microservices → "CLAUDE.md — Microservices variant" (services table, compose commands)
Single App → base "CLAUDE.md" template

Create files:

.claude/
├── memory/
│   ├── architecture.md   # From references/templates.md
│   ├── conventions.md
│   └── glossary.md
├── plans/                # Empty, user-managed
└── checkpoints/          # Empty, user-managed

CLAUDE.md                 # At repo root, <500 tokens

Phase 4: Validate

python3 scripts/estimate-tokens.py

Must pass: CLAUDE.md < 500 tokens, all memory files within budget.

If validation fails:

Move content from CLAUDE.md to .claude/memory/ files
Re-run scripts/estimate-tokens.py
Repeat until all files pass their budget

Phase 5: Memory Update

python3 scripts/generate-memory-update.py

Suggest user add to Claude's native memory:

Repo: {name} | Type: {type} | Stack: {stack}
{name} indexed {date} | Key: {modules}

Examples

User: "Index this repo"

Run detect-repo-type.py (Phase 1)
Analyze all 9 areas (Phase 2)
Output full analysis to conversation + create .claude/ structure (Phase 3)
Validate token budgets (Phase 4)
Suggest native memory update (Phase 5)

User: "Help me understand this codebase"

Check Claude memory for prior indexing
Search past chats: "{repo-name} architecture"
If not found: run full indexing workflow

If .claude/ Exists

Load existing files
Compare with current codebase
Flag inconsistencies
Update incrementally
Preserve  sections

Error Handling

Common issues:

Python version error → requires Python 3.9+: python3 --version or which python3

Critical Rules

CLAUDE.md hard limit: 500 tokens
Full analysis goes in conversation, not files
Files are pointers, not stores
Always suggest native memory update
Include search keywords in output