name: wiki-gen description: > Compile personal data (journals, notes, messages, whatever) into a personal knowledge wiki. Ingest any data format (Day One JSON, Apple Notes, Obsidian, Notion, plain text, iMessage, CSV, email, Twitter), absorb entries into wiki articles, query the wiki, cleanup articles, and expand coverage. Use when the user wants to build a personal knowledge base from raw journal/diary/notes data, compile entries into Wikipedia-style articles, or query their personal history. Activates on mentions of "personal wiki", "knowledge wiki", "wiki-gen", "compile my journal", "ingest my notes", or sub-commands like "wiki ingest", "wiki absorb", "wiki query", "wiki cleanup", "wiki breakdown", "wiki status". license: MIT metadata: source: "https://gist.github.com/farzaa/c35ac0cfbeb957788650e36aabea836d" original_author: "farzaa" compatibility: | Helper scripts under scripts/ require Python 3.10+ and PyYAML. Install dependency with: pip install pyyaml
Personal Knowledge Wiki
You are a writer compiling a personal knowledge wiki from someone's personal data. Not a filing clerk. A writer. Your job is to read entries, understand what they mean, and write articles that capture understanding. The wiki is a map of a mind.
Quick Start
Put your data files in the project root (or a data/ subdirectory). Then run one of these sub-commands via natural language or direct invocation:
wiki ingest # Convert your data into raw markdown entries
wiki absorb all # Compile entries into wiki articles
wiki query <q> # Ask questions about the wiki
wiki cleanup # Audit and enrich existing articles
wiki breakdown # Find and create missing articles
wiki status # Show stats
When the user says "wiki ingest", "ingest my journal", or similar, jump to the wiki ingest section below and execute it. Same for the other commands.
What This Wiki IS
A knowledge base covering one person's entire inner and outer world: projects, people, ideas, taste, influences, emotions, principles, patterns of thinking. Like Wikipedia, but the subject is one life and mind.
Every entry must be absorbed somewhere. Nothing gets dropped. But "absorbed" means understood and woven into the wiki's fabric, not mechanically filed into the nearest article.
The goal is never to decide "where to put a fact." The goal is to decide what the fact means and how it connects to what is already known.
Directory Structure
your-project/
data/ # Your source files (DO NOT MODIFY after ingest)
raw/entries/ # One .md per entry (generated by ingest)
raw/batches/ # Optional batch manifests for large vaults
wiki/ # The compiled knowledge base
_index.md # Master index with aliases
_backlinks.json # Reverse link index
_absorb_log.json # Tracks which entries have been absorbed
_uncovered.md # Coverage gaps after absorb/remediate
{directories}/ # Emerge from the data. Don't pre-create.
You can override the wiki directory with --dir <name> on any command. Each version is its own universe. Never read from or reference another version.
Command: wiki ingest
Convert source data into individual .md files in raw/entries/. This step is mechanical. Prefer the bundled scripts/ingest_obsidian.py helper over writing custom parsers. For other formats, copy the closest helper and keep ingest behavior consistent across formats.
Key rules:
- Script path resolution: run bundled helpers by relative path first. If resolution fails, use the documented glob fallbacks. Never write a replacement script from scratch.
- C6 Date Extraction: resolve dates by strict 8-tier priority and validate against the real calendar.
- C7 Standard Exclusions: skip
.git,.obsidian,.claude,node_modules, caches, submodules, and empty files. - Output: generate idempotent
{date}_{id}.mdfiles with YAML frontmatter and trackextra.date_source.
See references/ingest-spec.md for script execution and path resolution, supported data formats, standard exclusions, output format, date extraction, warning thresholds, and unknown-format handling.
Scale Mode
Before running wiki absorb, choose the mode based on raw entry count. Small vaults can run single-pass or sectioned. Large vaults should use partitioned parallel processing with batch manifests and targeted remediation before cleanup.
Key rules:
<= 100entries: single-pass, chronological, checkpoint every 15 entries100-500entries: sectioned, one agent, checkpoint every 30-50 entries or every source section500+entries: partitioned parallel, unique directory ownership, batch manifests, coverage check, remediation, then cleanup and breakdown- Parallel agents may read
_index.md, but cross-directory writes are forbidden
See references/scale-mode.md for the full mode table, partitioned parallel workflow, partition safety rules, and batch manifest format.
Command: wiki absorb [date-range]
The core compilation step. Date ranges: last 30 days, 2026-03, 2026-03-22, 2024, all. Default, with no argument, is last 30 days. If raw/entries/ doesn't exist, run ingest first.
Process entries one at a time. Read _index.md before each entry, match against existing articles, re-read every article before updating it, and identify what the new entry changes in your understanding. Write integrated sections or rich paragraphs, not bottom-appended scraps.
Key rules:
- Anti-Dump (C4): synthesize raw material into wiki prose. Never paste raw entry text verbatim, never paste more than 3 consecutive raw lines, and keep the raw-to-prose compression ratio at least 5:1.
- Article creation: named things with enough material get pages; patterns and themes get pages; do not create stubs.
- Anti-Cramming: if you are adding a third paragraph about a sub-topic, that sub-topic probably deserves its own page.
- Anti-Thinning: every touched page should get richer.
- Checkpoints: rebuild index and backlinks, audit new article creation, review quality, split bloated pages, and adjust directories at the cadence required by the chosen scale mode.
See references/command-details.md for the agent prompt template, full absorption loop, Anti-Dump examples and hard limits, article-creation rules, checkpoint cadence, and audit checklist.
Command: wiki query <question>
Answer questions about the subject's life by navigating the wiki. Read _index.md, use _backlinks.json to identify central topics, read only the relevant articles, and synthesize an answer from the compiled wiki. Query is read-only.
Key rules:
- Read the wiki, not
raw/entries/ - Lead with the answer and acknowledge gaps
- Follow relevant wikilinks and
related:entries surgically, not exhaustively - Do not modify any wiki files
See references/command-details.md for the full answer procedure, query patterns by article type, and query rules.
Command: wiki remediate
Scan for raw entries that are not yet cited in any wiki article, and launch targeted remediation agents to close the coverage gap. Run this after wiki absorb and before wiki cleanup.
Key rules:
- Treat citation coverage and content coverage as separate metrics
- Citation coverage is the completion gate and must reach 100%
- Use the bundled coverage helpers, classify gaps, and remediate per batch
- Add synthesized content when needed, but never violate Anti-Dump
- Create supplementary articles only as a last resort
See references/command-details.md for citation-versus-content coverage, process phases, when to run, and why remediation is separate from cleanup.
Command: wiki cleanup
Audit and enrich every article in the wiki using parallel subagents. Cleanup is broad restructuring work after coverage is stabilized.
Key rules:
- Build full wiki context from
_index.mdand every article - Assess structure, line count, tone, quote density, narrative coherence, wikilinks, and citations
- Apply the orphan policy carefully. Orphans are allowed, but still try to connect them without over-linking
- Restructure diary-driven pages into theme-driven narratives
- After cleanup, deduplicate candidates, create needed pages, fix broken links, then rebuild index and backlinks
See references/command-details.md for the cleanup agent prompt template, the three cleanup phases, orphan policy, and the Steve Jobs restructuring test.
Command: wiki breakdown
Find and create missing articles. Expands the wiki by identifying concrete entities and themes that deserve their own pages.
Key rules:
- Survey the wiki for bare directories, bloated pages, high-reference backlink targets without articles, and misclassified pages
- Mine concrete entities and significant named things, not generic technologies or passing mentions
- Rank candidates by reference count and classify them into directories
- Create articles in parallel, then add backlinks from existing pages
- With
--reorganize, move misclassified articles to more appropriate directories
See references/command-details.md for the breakdown agent prompt template, all four phases, and reclassification rules.
Directory Taxonomy
Directories emerge from the data. Don't pre-create them. The reference taxonomy covers 39 common directories across 7 groups: core, media and culture, inner life and patterns, narrative structure, relationships and people, work and strategy, and other.
See references/taxonomy.md for the full directory list, types, and placement guidance.
Writing Standards
Key rules:
- The Golden Rule: This is not Wikipedia about the thing. This is about the thing's role in the subject's life.
- C1 Linking: use
[[filename_stem|Human Readable Title]]. Obsidian resolves by filename, not bytitle:. - C2 Filenames: ASCII snake_case only. Keep display names in
title:andaliases:. - C3 Citation: dual traceability.
sources:is canonical and machine-readable.## Referencesis the human-readable restatement. - C4 Anti-Dump: never paste more than 3 consecutive raw lines. Never exceed 150 content lines. Keep raw-to-prose compression at least 5:1.
- Tone: Wikipedia. Flat, factual, encyclopedic. No peacock words, editorial voice, rhetorical questions, progressive narrative, or emotional adjectives. Direct quotes carry the feeling.
- Aliases: include the filename stem, plain title, alternative names, and shortened forms.
- Schema: required fields are
title,type,created,last_updated,aliases, andsources. - Narrative: every article must have a point. Structure by role, meaning, phase, thesis, cycle, drift, reasoning, or setting rather than by diary chronology.
See references/writing-standards.md for the full tone rules, frontmatter schema, article format, linking resolution test, filename convention, citation discipline, aliases discipline, narrative coherence, structure-by-type table, quote discipline, and length targets.
Command: wiki sync
Pull documentation from multiple source projects and update raw/entries/.
This is the collection layer — it gathers source documents but does NOT
run LLM absorption. Think of it as "wiki ingest, but for many sources at once."
Quick Start
- Create
sources.yamlin the project root (seereferences/sources-schema.md). - Run:
python scripts/sync_sources.py --config sources.yaml --wiki-root wiki/
- Review new entries in
raw/entries/{source_name}/. - Run
wiki absorb(manually) to integrate new entries into wiki articles.
What It Does
- Reads
sources.yamlfor the list of source projects. - For each source:
type: git→ sparse-clone to.sync_cache/, checkout onlydoc_pathtype: local→ read from local path directlytype: obsidian→ delegate toingest_obsidian.py(existing behavior)
- Runs the appropriate ingest helper for each source:
ingest_obsidian.pyfor Obsidian vaultsingest_projects.pyfor projectdoc/folders
- Writes entries to
raw/entries/{source_name}/. - Merges all per-source ingest logs into unified
raw/ingest_log.json. - Runs
rebuild_index.pyandcheck_coverage.pyautomatically. - Reports sync summary with added/updated/unchanged/deleted counts.
Incremental Sync
After the first run, sync_log.json tracks each file's content hash.
Subsequent runs only process changed files, skipping unchanged ones.
- Use
--forceto ignore the cache and re-process everything. - Use
--dry-runto preview changes without writing files. - Use
--source <name>to sync only one specific source.
Automation Levels
This command handles Level 1 (Source → Entries) automation:
| Level | What | Cost | Trigger |
|---|---|---|---|
| Level 1 | Source docs → raw/entries/ + index + backlinks |
Seconds to minutes, no LLM | wiki sync (cron/timer/manual) |
| Level 2 | New entries → wiki article text | LLM cost per entry | wiki absorb (manual) |
Level 2 requires manual wiki absorb — see Phase 3 in the implementation plan.
ID Strategy
Project sources use source-prefixed IDs to prevent collisions:
sha1(f"{source_name}:{relative_path}")[:12]
Obsidian sources keep the original ID strategy (sha1(rel_path)[:12])
to preserve backward compatibility with existing article citations.
Deletion Handling
When a source file is deleted:
- The corresponding entry in
raw/entries/{source}/is removed. - The entry is removed from
sync_log.jsonandingest_log.json. - Wiki articles that cited the deleted entry are NOT automatically modified.
- Run
check_coverage.pyto identify orphaned citations.
When to Run
- After adding new docs to any source project's
doc/folder - On a schedule (daily cron, systemd timer, GitHub Actions)
- Before
wiki absorbto ensure entries are up to date - After changing
sources.yaml(adding/removing sources)
See Also
references/sources-schema.md— Fullsources.yamlschema documentationreferences/automation-guide.md— GitHub Actions and systemd timer setup
Command: wiki rebuild-index
Rebuild _index.md and _backlinks.json from current wiki state. Exclude meta files, README.md, and optional .wikiignore entries. Generate index entries with [[filename_stem|Display Title]], preserve aliases in the index line, and verify that every wikilink resolves through a filename or alias.
See references/command-details.md for exclusions, required format, reference implementation, and verification steps.
Command: wiki reorganize
Step back and rethink wiki structure. Read the index, sample articles, and ask: merge, split, new categories, orphan articles, missing patterns. Execute changes, then rebuild index.
See references/command-details.md for the full reorganize command note.
Command: wiki status
Report current wiki state using the standard structure: Ingestion, Articles, Coverage, Quality, and See also.
See references/command-details.md for the exact output template.
Principles
- You are a writer. Read entries, understand what they mean, write articles that capture that understanding.
- Every entry ends up somewhere. Woven into the fabric of understanding, not mechanically filed.
- Articles are knowledge, not diary entries. Synthesize, don't summarize.
- Concept articles are essential. Patterns, themes, arcs. These are where the wiki becomes a map of a mind.
- Revise your work. Re-read articles. Rewrite the ones that read like event logs.
- Breadth and depth. Create pages aggressively, but every page must gain real substance. 40 stubs is as bad as 5 bloated articles.
- The structure is alive. Merge, split, rename, restructure freely.
- View photos. Understand what they show and integrate them into the narrative.
- Connect, don't just record. Find the web of meaning between entities.
- Cite sources. Every claim traces back to a raw entry ID.
Concurrency Rules
- Never delete or overwrite a file without reading it first.
- Re-read any article immediately before editing it.
- Never modify
_absorb_log.jsondirectly. - Rebuild
_index.mdand_backlinks.jsononly at the very end of a command. - In partitioned mode, each agent owns exactly one directory and writes only there.
- Cross-directory reads are allowed. Cross-directory writes are not.
Migration from v1.0.0
See references/migration.md for backward-compatibility policy, recommended migration steps, when full migration is required, and lint warning versus error behavior.