repo-archaeologist

star 0

Use when setting up a new repo for TKS team, onboarding a legacy project, or when institutional knowledge is missing/outdated. Triggers: 'document this repo', 'new project setup', 'knowledge extraction', 'what does this codebase do', missing DOMAIN_MAP.md or TRAPS.md or DECISIONS_LOG.md in docs/institutional-memory/.

mduongvandinh By mduongvandinh schedule Updated 5/14/2026

name: repo-archaeologist description: "Use when setting up a new repo for TKS team, onboarding a legacy project, or when institutional knowledge is missing/outdated. Triggers: 'document this repo', 'new project setup', 'knowledge extraction', 'what does this codebase do', missing DOMAIN_MAP.md or TRAPS.md or DECISIONS_LOG.md in docs/institutional-memory/."

Repo Archaeologist

Overview

Extract institutional knowledge from a codebase that only exists in people's heads or scattered across git history. Output 3 living documents that protect against 属人化 (knowledge trapped in individuals).

Core principle: Read code + git history + bug tickets → generate knowledge docs → dev reviews in 5 minutes. Dev never writes documentation from scratch.

When to Use

  • New repo joining TKS portfolio
  • Legacy repo with no documentation
  • Quarterly refresh of existing docs
  • Before a key developer rotates off project
  • NOT for: routine PR documentation (use /knowledge-capture instead)

Prerequisites

  • GitHub CLI (gh) authenticated
  • ClickUp API access (optional, enriches output)
  • Repo cloned locally with full git history

Process

Phase 1: Gather Raw Data

First, detect project type and set variables:

# Auto-detect source root and build tool
if [ -d "src/main/java" ]; then
  SRC_ROOT="src/main/java"
  TEST_ROOT="src/test/java"
  FILE_EXT="*.java"
elif [ -d "src" ]; then
  SRC_ROOT="src"
  TEST_ROOT="src"
  FILE_EXT="*.ts"
else
  SRC_ROOT="."
  TEST_ROOT="."
  FILE_EXT="*.py"
fi

# Auto-detect build tool
if [ -f "build.gradle" ] || [ -f "build.gradle.kts" ]; then
  BUILD_TOOL="gradle"
elif [ -f "pom.xml" ]; then
  BUILD_TOOL="maven"
elif [ -f "package.json" ]; then
  BUILD_TOOL="npm"
else
  BUILD_TOOL="unknown"
fi

# Use OS-appropriate temp dir
TMPDIR="${TMPDIR:-/tmp}"

Then gather data:

# 1. PR history with descriptions (decisions live here)
gh pr list --state merged --json number,title,body,mergedAt,author,files \
  --limit 500 --jq '.[] | select(.body | length > 50)' > "$TMPDIR/pr-history.json"

# 2. Code churn — who touched what, how often
git log --numstat --since="12 months ago" --format="COMMIT:%H|%ae|%aI|%s" \
  > "$TMPDIR/git-churn.txt"

# 3. Bug-related commits (--grep flags are OR by default with --all)
#    Use separate commands to ensure OR logic, then deduplicate
{ git log --all --oneline --grep="fix" --since="12 months ago"; \
  git log --all --oneline --grep="bug" --since="12 months ago"; \
  git log --all --oneline --grep="hotfix" --since="12 months ago"; \
} | sort -u > "$TMPDIR/bug-commits.txt"

# 4. Code comments containing warnings — use auto-detected SRC_ROOT
grep -rn "FIXME\|HACK\|WORKAROUND\|DO NOT\|注意\|TODO.*careful\|WARNING" \
  --include="$FILE_EXT" "$SRC_ROOT" > "$TMPDIR/warning-comments.txt" 2>/dev/null || true

# 5. Config files for environment context (search both .yml and .properties)
find . \( -name "application*.yml" -o -name "application*.properties" \
  -o -name "application*.yaml" -o -name ".env*" -o -name ".env.example" \
  -o -name "docker-compose*" -o -name "Dockerfile" \) \
  -not -path "*/build/*" -not -path "*/node_modules/*" \
  | head -20 > "$TMPDIR/config-files.txt"

Note on ClickUp: ClickUp integration is OPTIONAL. If no ClickUp API token is available, skip bug history from ClickUp and rely solely on git history (step 3 above). The skill MUST work without ClickUp — git history alone provides sufficient data for a useful output.

Phase 2: Generate Documents

Create docs/institutional-memory/ directory. Generate 3 files:

File 1: DECISIONS_LOG.md

What to extract from PRs:

  • Filter PRs where body contains: "because", "instead of", "trade-off", "decided", "chosen", "なぜ", "理由", "選定"
  • Classify each: Architecture Decision | Business Rule Change | Performance Optimization | Bug Root Cause
  • For each decision: WHEN (date) → WHO (author) → WHAT (change) → WHY (reasoning) → IMPACT (what breaks if reversed)

Structure:

# Decision Log — [Project Name]
Auto-generated by /repo-archaeologist | Last scan: [date]
Repo: [repo-url] | PRs analyzed: [count] | Period: [date range]

## How to use this file
- Before making architectural changes: search here first
- Before reverting old code: check if there's a decision explaining WHY
- After making new decisions: /knowledge-capture will auto-update this file

## Architecture Decisions
### DEC-001: [Short title]
- **When**: [date] (PR #[number], by @[author])
- **Context**: [Why this decision was needed — business driver]
- **Choice**: [What was chosen]
- **Alternatives considered**: [What was rejected and why]
- **Constraint**: [External constraint if any — client requirement, legacy system, regulation]
- **⚠️ Reversal risk**: [What breaks if someone changes this without understanding context]

Critical rule: Every decision MUST have "Reversal risk". This is the most valuable field — it's what prevents the next developer from breaking things.

File 2: DOMAIN_MAP.md

What to extract from code:

  • Package/module structure → business domain mapping
  • Database migration files → domain model understanding
  • Scheduled jobs/batch configs → operational context
  • API endpoints → user-facing functionality
  • External service integrations → dependency awareness

Structure:

# Business Domain Map — [Project Name]
Read this BEFORE reading any code.

## Client Profile
- Company: [name], Industry: [industry]
- End users: [who uses the system and why]
- Peak seasons: [when workload spikes — CRITICAL for deploy planning]
- Escalation pattern: [what makes client call/email urgently]

## Module Map
### `[module-name]/` — [Business meaning in Vietnamese]
- **Japanese term**: [日本語の業務用語]
- **What it does for the business**: [2-3 sentences, no code jargon]
- **Trigger**: [What causes this module to run — user action, schedule, event]
- **If it fails**: [Business impact + escalation level: 低/中/高]
- **Key files**: [3-5 most important files]
- **Domain expert on team**: [@person] (backup: [@person])
- **Connected to**: [Other modules that depend on or feed into this]

## Data Flow
[Describe how data moves through the system in business terms]
Example: "Client's customer uploads CSV → system validates → calculates tax → 
generates 請求書 → sends to client for approval → client sends to their customer"

## External Dependencies
| System | Purpose | Owner | SLA | Risk if down |
|--------|---------|-------|-----|-------------|

Critical rule: Write in Vietnamese for team, keep Japanese business terms (日本語) as-is. A dev who reads this file should understand the BUSINESS before touching code.

File 3: TRAPS.md

What to scan for:

Trap Category Detection Method
Timezone Search for LocalDateTime.now(), new Date(), datetime.now() without timezone
Rounding Search for BigDecimal, Math.round, toFixed near money/tax logic
Encoding Search for charset, encoding, Shift-JIS, Windows-31J, CSV export
Hardcoded config Search for hardcoded URLs, IPs, magic numbers in business logic
External API quirks Search for Thread.sleep, retry logic, rate limit comments
Date edge cases Search for month-end, fiscal year, leap year logic
Multi-tenant leaks Search for missing tenant filters in queries

Also extract from:

  • Bug commits that took > 1 day to resolve (git log + ClickUp cross-reference if available)
  • Code comments with WARNING/FIXME/HACK/注意
  • Files with > 5 hotfixes in same area (git blame analysis)
  • Dependency health (based on auto-detected build tool):
    • Gradle: ./gradlew dependencyUpdates (if plugin available) or read build.gradle
    • Maven: mvn versions:display-dependency-updates
    • npm: npm outdated
    • If dependency check command unavailable, skip — this is optional enrichment

Structure:

# ⚠️ Trap Map — [Project Name]
READ THIS BEFORE WRITING CODE. Every trap here caused a real bug.

## 🔴 Critical Traps (Mistake = production bug)
### TRAP-001: [Short name]
- **Files**: [Exact file paths]
- **Wrong way**: [Code that looks right but is wrong]
- **Right way**: [Correct code with explanation]
- **Why it's a trap**: [Why the wrong way seems reasonable]
- **History**: [Bug ticket or incident that revealed this trap]
- **Detection**: [How to catch this in code review]

## 🟡 Caution Traps (Mistake = wasted time)
[Same structure, lower severity]

## 🟢 Known Workarounds (Ugly but necessary)
[Document workarounds that look like bugs but are intentional]

Critical rule: Every trap MUST have "Wrong way" and "Right way" as actual code. Abstract descriptions are useless — show the exact code pattern.

Phase 3: Validate

After generating all 3 files:

  1. Run wc -l on each file — DOMAIN_MAP should be longest, TRAPS shortest
  2. Cross-check: Every module in DOMAIN_MAP should appear in at least one DECISIONS_LOG entry
  3. Cross-check: At least 50% of TRAPS entries should reference a real bug/PR number
  4. Ask dev to review: "Anything critical missing? Anything wrong?" — target 5-minute review

Phase 4: Commit

git add docs/institutional-memory/
git commit -m "docs: generate institutional memory via /repo-archaeologist

Generated:
- DECISIONS_LOG.md: [N] decisions from [M] PRs
- DOMAIN_MAP.md: [N] modules documented
- TRAPS.md: [N] traps identified

Review: [senior dev name] confirmed accuracy"

Common Mistakes

Mistake Fix
Writing code documentation instead of business knowledge Ask "would a PM understand this?" — if no, rewrite
Missing "Reversal risk" on decisions Every decision needs this — it's the whole point
Traps without code examples Abstract trap = useless trap. Show wrong vs right code
Trying to document everything Focus on top 20% most critical modules first
Domain map uses developer jargon Write for a new developer who has ZERO context

Refresh Schedule

  • Full rescan: Quarterly, or when project scope changes significantly
  • Incremental updates: Handled by /knowledge-capture on every PR
  • Trap additions: Immediately after any production bug
Install via CLI
npx skills add https://github.com/mduongvandinh/repo-archaeologist --skill repo-archaeologist
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
mduongvandinh
mduongvandinh Explore all skills →