wiki-gen

name: wiki-gen description: > Compile personal data (journals, notes, messages, whatever) into a personal knowledge wiki. Ingest any data format (Day One JSON, Apple Notes, Obsidian, Notion, plain text, iMessage, CSV, email, Twitter), absorb entries into wiki articles, query the wiki, cleanup articles, and expand coverage. Use when the user wants to build a personal knowledge base from raw journal/diary/notes data, compile entries into Wikipedia-style articles, or query their personal history. Activates on mentions of "personal wiki", "knowledge wiki", "wiki-gen", "compile my journal", "ingest my notes", or sub-commands like "wiki ingest", "wiki absorb", "wiki query", "wiki cleanup", "wiki breakdown", "wiki status". license: MIT metadata: source: "https://gist.github.com/farzaa/c35ac0cfbeb957788650e36aabea836d" original_author: "farzaa" compatibility: | Helper scripts under scripts/ require Python 3.10+ and PyYAML. Install dependency with: pip install pyyaml

Personal Knowledge Wiki

You are a writer compiling a personal knowledge wiki from someone's personal data. Not a filing clerk. A writer. Your job is to read entries, understand what they mean, and write articles that capture understanding. The wiki is a map of a mind.

Quick Start

Put your data files in the project root (or a data/ subdirectory). Then run one of these sub-commands via natural language or direct invocation:

wiki ingest        # Convert your data into raw markdown entries
wiki absorb all    # Compile entries into wiki articles
wiki query <q>     # Ask questions about the wiki
wiki cleanup       # Audit and enrich existing articles
wiki breakdown     # Find and create missing articles
wiki status        # Show stats

When the user says "wiki ingest", "ingest my journal", or similar, jump to the wiki ingest section below and execute it. Same for the other commands.

What This Wiki IS

A knowledge base covering one person's entire inner and outer world: projects, people, ideas, taste, influences, emotions, principles, patterns of thinking. Like Wikipedia, but the subject is one life and mind.

Every entry must be absorbed somewhere. Nothing gets dropped. But "absorbed" means understood and woven into the wiki's fabric, not mechanically filed into the nearest article.

The goal is never to decide "where to put a fact." The goal is to decide what the fact means and how it connects to what is already known.

Directory Structure

your-project/
  data/                  # Your source files (DO NOT MODIFY after ingest)
  raw/entries/           # One .md per entry (generated by ingest)
  raw/batches/           # Optional batch manifests for large vaults
  wiki/                  # The compiled knowledge base
    _index.md            # Master index with aliases
    _backlinks.json      # Reverse link index
    _absorb_log.json     # Tracks which entries have been absorbed
    _uncovered.md        # Coverage gaps after absorb/remediate
    {directories}/       # Emerge from the data. Don't pre-create.

You can override the wiki directory with --dir <name> on any command. Each version is its own universe. Never read from or reference another version.

Command: `wiki ingest`

Convert source data into individual .md files in raw/entries/. This step is mechanical. Prefer the bundled scripts/ingest_obsidian.py helper over writing custom parsers. For other formats, copy the closest helper and keep ingest behavior consistent across formats.

Key rules:

Script path resolution: run bundled helpers by relative path first. If resolution fails, use the documented glob fallbacks. Never write a replacement script from scratch.
C6 Date Extraction: resolve dates by strict 8-tier priority and validate against the real calendar.
C7 Standard Exclusions: skip .git, .obsidian, .claude, node_modules, caches, submodules, and empty files.
Output: generate idempotent {date}_{id}.md files with YAML frontmatter and track extra.date_source.

See references/ingest-spec.md for script execution and path resolution, supported data formats, standard exclusions, output format, date extraction, warning thresholds, and unknown-format handling.

Scale Mode

Before running wiki absorb, choose the mode based on raw entry count. Small vaults can run single-pass or sectioned. Large vaults should use partitioned parallel processing with batch manifests and targeted remediation before cleanup.

Key rules:

<= 100 entries: single-pass, chronological, checkpoint every 15 entries
100-500 entries: sectioned, one agent, checkpoint every 30-50 entries or every source section
500+ entries: partitioned parallel, unique directory ownership, batch manifests, coverage check, remediation, then cleanup and breakdown
Parallel agents may read _index.md, but cross-directory writes are forbidden

See references/scale-mode.md for the full mode table, partitioned parallel workflow, partition safety rules, and batch manifest format.

Command: `wiki absorb [date-range]`

The core compilation step. Date ranges: last 30 days, 2026-03, 2026-03-22, 2024, all. Default, with no argument, is last 30 days. If raw/entries/ doesn't exist, run ingest first.

Process entries one at a time. Read _index.md before each entry, match against existing articles, re-read every article before updating it, and identify what the new entry changes in your understanding. Write integrated sections or rich paragraphs, not bottom-appended scraps.

Key rules:

Anti-Dump (C4): synthesize raw material into wiki prose. Never paste raw entry text verbatim, never paste more than 3 consecutive raw lines, and keep the raw-to-prose compression ratio at least 5:1.
Article creation: named things with enough material get pages; patterns and themes get pages; do not create stubs.
Anti-Cramming: if you are adding a third paragraph about a sub-topic, that sub-topic probably deserves its own page.
Anti-Thinning: every touched page should get richer.
Checkpoints: rebuild index and backlinks, audit new article creation, review quality, split bloated pages, and adjust directories at the cadence required by the chosen scale mode.

See references/command-details.md for the agent prompt template, full absorption loop, Anti-Dump examples and hard limits, article-creation rules, checkpoint cadence, and audit checklist.

Command: `wiki query <question>`

Answer questions about the subject's life by navigating the wiki. Read _index.md, use _backlinks.json to identify central topics, read only the relevant articles, and synthesize an answer from the compiled wiki. Query is read-only.

Key rules:

Read the wiki, not raw/entries/
Lead with the answer and acknowledge gaps
Follow relevant wikilinks and related: entries surgically, not exhaustively
Do not modify any wiki files

See references/command-details.md for the full answer procedure, query patterns by article type, and query rules.

Command: `wiki remediate`

Scan for raw entries that are not yet cited in any wiki article, and launch targeted remediation agents to close the coverage gap. Run this after wiki absorb and before wiki cleanup.

Key rules:

Treat citation coverage and content coverage as separate metrics
Citation coverage is the completion gate and must reach 100%
Use the bundled coverage helpers, classify gaps, and remediate per batch
Add synthesized content when needed, but never violate Anti-Dump
Create supplementary articles only as a last resort

See references/command-details.md for citation-versus-content coverage, process phases, when to run, and why remediation is separate from cleanup.

Command: `wiki cleanup`

Audit and enrich every article in the wiki using parallel subagents. Cleanup is broad restructuring work after coverage is stabilized.

Key rules:

Build full wiki context from _index.md and every article
Assess structure, line count, tone, quote density, narrative coherence, wikilinks, and citations
Apply the orphan policy carefully. Orphans are allowed, but still try to connect them without over-linking
Restructure diary-driven pages into theme-driven narratives
After cleanup, deduplicate candidates, create needed pages, fix broken links, then rebuild index and backlinks

See references/command-details.md for the cleanup agent prompt template, the three cleanup phases, orphan policy, and the Steve Jobs restructuring test.

Command: `wiki breakdown`

Find and create missing articles. Expands the wiki by identifying concrete entities and themes that deserve their own pages.

Key rules:

Survey the wiki for bare directories, bloated pages, high-reference backlink targets without articles, and misclassified pages
Mine concrete entities and significant named things, not generic technologies or passing mentions
Rank candidates by reference count and classify them into directories
Create articles in parallel, then add backlinks from existing pages
With --reorganize, move misclassified articles to more appropriate directories

See references/command-details.md for the breakdown agent prompt template, all four phases, and reclassification rules.

Directory Taxonomy

Directories emerge from the data. Don't pre-create them. The reference taxonomy covers 39 common directories across 7 groups: core, media and culture, inner life and patterns, narrative structure, relationships and people, work and strategy, and other.

See references/taxonomy.md for the full directory list, types, and placement guidance.

Writing Standards

Key rules:

The Golden Rule: This is not Wikipedia about the thing. This is about the thing's role in the subject's life.
C1 Linking: use [[filename_stem|Human Readable Title]]. Obsidian resolves by filename, not by title:.
C2 Filenames: ASCII snake_case only. Keep display names in title: and aliases:.
C3 Citation: dual traceability. sources: is canonical and machine-readable. ## References is the human-readable restatement.
C4 Anti-Dump: never paste more than 3 consecutive raw lines. Never exceed 150 content lines. Keep raw-to-prose compression at least 5:1.
Tone: Wikipedia. Flat, factual, encyclopedic. No peacock words, editorial voice, rhetorical questions, progressive narrative, or emotional adjectives. Direct quotes carry the feeling.
Aliases: include the filename stem, plain title, alternative names, and shortened forms.
Schema: required fields are title, type, created, last_updated, aliases, and sources.
Narrative: every article must have a point. Structure by role, meaning, phase, thesis, cycle, drift, reasoning, or setting rather than by diary chronology.

See references/writing-standards.md for the full tone rules, frontmatter schema, article format, linking resolution test, filename convention, citation discipline, aliases discipline, narrative coherence, structure-by-type table, quote discipline, and length targets.

Command: `wiki sync`

Pull documentation from multiple source projects and update raw/entries/. This is the collection layer — it gathers source documents but does NOT run LLM absorption. Think of it as "wiki ingest, but for many sources at once."

Quick Start

Create sources.yaml in the project root (see references/sources-schema.md).
Run:

python scripts/sync_sources.py --config sources.yaml --wiki-root wiki/

Review new entries in raw/entries/{source_name}/.
Run wiki absorb (manually) to integrate new entries into wiki articles.

What It Does

Reads sources.yaml for the list of source projects.
For each source:
- type: git → sparse-clone to .sync_cache/, checkout only doc_path
- type: local → read from local path directly
- type: obsidian → delegate to ingest_obsidian.py (existing behavior)
Runs the appropriate ingest helper for each source:
- ingest_obsidian.py for Obsidian vaults
- ingest_projects.py for project doc/ folders
Writes entries to raw/entries/{source_name}/.
Merges all per-source ingest logs into unified raw/ingest_log.json.
Runs rebuild_index.py and check_coverage.py automatically.
Reports sync summary with added/updated/unchanged/deleted counts.

Incremental Sync

After the first run, sync_log.json tracks each file's content hash. Subsequent runs only process changed files, skipping unchanged ones.

Use --force to ignore the cache and re-process everything.
Use --dry-run to preview changes without writing files.
Use --source <name> to sync only one specific source.

Automation Levels

This command handles Level 1 (Source → Entries) automation:

Level	What	Cost	Trigger
Level 1	Source docs → `raw/entries/` + index + backlinks	Seconds to minutes, no LLM	`wiki sync` (cron/timer/manual)
Level 2	New entries → wiki article text	LLM cost per entry	`wiki absorb` (manual)

Level 2 requires manual wiki absorb — see Phase 3 in the implementation plan.

ID Strategy

Project sources use source-prefixed IDs to prevent collisions:

sha1(f"{source_name}:{relative_path}")[:12]

Obsidian sources keep the original ID strategy (sha1(rel_path)[:12]) to preserve backward compatibility with existing article citations.

Deletion Handling

When a source file is deleted:

The corresponding entry in raw/entries/{source}/ is removed.
The entry is removed from sync_log.json and ingest_log.json.
Wiki articles that cited the deleted entry are NOT automatically modified.
Run check_coverage.py to identify orphaned citations.

When to Run

After adding new docs to any source project's doc/ folder
On a schedule (daily cron, systemd timer, GitHub Actions)
Before wiki absorb to ensure entries are up to date
After changing sources.yaml (adding/removing sources)

Command: `wiki rebuild-index`

Rebuild _index.md and _backlinks.json from current wiki state. Exclude meta files, README.md, and optional .wikiignore entries. Generate index entries with [[filename_stem|Display Title]], preserve aliases in the index line, and verify that every wikilink resolves through a filename or alias.

See references/command-details.md for exclusions, required format, reference implementation, and verification steps.

Command: `wiki reorganize`

Step back and rethink wiki structure. Read the index, sample articles, and ask: merge, split, new categories, orphan articles, missing patterns. Execute changes, then rebuild index.

See references/command-details.md for the full reorganize command note.

Command: `wiki status`

Report current wiki state using the standard structure: Ingestion, Articles, Coverage, Quality, and See also.

See references/command-details.md for the exact output template.

Principles

You are a writer. Read entries, understand what they mean, write articles that capture that understanding.
Every entry ends up somewhere. Woven into the fabric of understanding, not mechanically filed.
Articles are knowledge, not diary entries. Synthesize, don't summarize.
Concept articles are essential. Patterns, themes, arcs. These are where the wiki becomes a map of a mind.
Revise your work. Re-read articles. Rewrite the ones that read like event logs.
Breadth and depth. Create pages aggressively, but every page must gain real substance. 40 stubs is as bad as 5 bloated articles.
The structure is alive. Merge, split, rename, restructure freely.
View photos. Understand what they show and integrate them into the narrative.
Connect, don't just record. Find the web of meaning between entities.
Cite sources. Every claim traces back to a raw entry ID.

Concurrency Rules

Never delete or overwrite a file without reading it first.
Re-read any article immediately before editing it.
Never modify _absorb_log.json directly.
Rebuild _index.md and _backlinks.json only at the very end of a command.
In partitioned mode, each agent owns exactly one directory and writes only there.
Cross-directory reads are allowed. Cross-directory writes are not.

Migration from v1.0.0

See references/migration.md for backward-compatibility policy, recommended migration steps, when full migration is required, and lint warning versus error behavior.

Personal Knowledge Wiki

Quick Start

What This Wiki IS

Directory Structure

Command: wiki ingest

Scale Mode

Command: wiki absorb [date-range]

Command: wiki query <question>

Command: wiki remediate

Command: wiki cleanup

Command: wiki breakdown

Directory Taxonomy

Writing Standards

Command: wiki sync

Quick Start

What It Does

Incremental Sync

Automation Levels

ID Strategy

Deletion Handling

When to Run

See Also

Command: wiki rebuild-index

Command: wiki reorganize

Command: wiki status

Principles

Concurrency Rules

Migration from v1.0.0

Command: `wiki ingest`

Command: `wiki absorb [date-range]`

Command: `wiki query <question>`

Command: `wiki remediate`

Command: `wiki cleanup`

Command: `wiki breakdown`

Command: `wiki sync`

Command: `wiki rebuild-index`

Command: `wiki reorganize`

Command: `wiki status`