auto-wiki - SKILL.md Agent Skill

name: auto-wiki version: 0.3.0 description: | Knowledge compiler: teaches the agent to incrementally compile source files into a persistent wiki, enabling cross-session knowledge accumulation. Runtime dependencies: Python 3.8+ (stdlib + pydantic). Optional enhancements: WebSearch (active search), external MCP validators (logic validation).

Five modes, auto-routed by user intent:

recall → User wants questions answered from accumulated knowledge. Trigger words: recall, knowledge mode, open the wiki, answer with the wiki, according to the wiki, based on what we've accumulated, check the wiki, does the wiki have, we researched this before, from our earlier notes.

ingest → User provided new material to compile into the wiki. Trigger words: ingest, compile, compile this, digest this, study this, archive this, add this in, accumulate, research this, organize this for me, add to the knowledge base.

query → User asked a specific question and wants an answer from the wiki (one-shot). Trigger words: query, answer from the wiki, what does the wiki say, look it up.

lint → User wants to check wiki health. Trigger words: lint, check the wiki, wiki health, clean it up, any contradictions.

deep-dive → User wants the agent to find knowledge gaps automatically and fill them. Trigger words: deep-dive, deep dive, fill the gaps, hunt for gaps, go hard, auto-complete, knowledge completion, comprehensive fill. Note: deep-dive is not a standalone mode; it is a combined pipeline of lint (Coverage) + ingest (search-fill).

Routing rules: if the user provides no new material but mentions the wiki or domain knowledge → recall. If the user provides files or large blocks of text → ingest. If the user says "deep-dive" or "go hard" → run the deep-dive pipeline. If unsure → ask the user.

Knowledge Compiler

The agent does research, pulls data, writes reports — the wiki threads these outputs together. The more you use the agent, the better it understands your domain.

Runtime Dependencies & Permission Declaration

Dependency	Required?	Notes
Python 3.8+	Required	`schema.py` (frontmatter validation), `store.py` (SQLite data management), `build_index.py` (FTS5 indexing) are all Python scripts. Stdlib only (`sqlite3`, `json`, `pathlib`) + `pydantic`
pydantic	Required	Needed by `schema.py` frontmatter validation. `pip install pydantic`
Filesystem write	Required	Creates and edits Markdown, SQLite, and `.obsidian/` config under `wiki/{topic}/`. Confirms the location with the user when first creating `wiki/`
WebSearch / WebFetch	Optional	Needed for active mode (agent searches for materials autonomously). Not needed for passive mode (user provides files)
External validator (MCP)	Optional	Only invoked by lint when the wiki declares a validator. Silently skipped when unreachable, zero impact. No user credentials required — `Mcp-Session-Id` is the standard MCP protocol session handshake, completed automatically by the agent
Search-class MCP	Optional	deep-dive and active ingest can use domain data MCPs to enhance search quality. Falls back to WebSearch when absent

Core promise: passive mode (user provides files → agent compiles) only needs Python 3 + file read/write, zero network dependencies. All network calls are optional enhancements, and an environment check on first use informs the user of what is available.

Quick Start

User: /auto-wiki recall personal-pension
Agent: [Scan wiki/personal-pension/ → read hub page ({Domain}.md, named in English) → load data.db summary]
Agent: Entered recall mode. Current wiki: 22 pages / 8 data points / 2 contested.
       From now on I will check the wiki before answering.

User: What causes the low participation rate?
Agent: [Read wiki pages such as enrollment-friction, tax-incentive-effect]
Agent: Based on the 6 sources accumulated in the wiki... (citing specific pages and data)
        Note: the tax-incentive effect is contested (77.8% vs 25%), see [[participation-willingness]]

Core Idea

The agent does research, writes reports, and pulls data for you every day — then forgets it all. Next time you ask about the same domain, it starts from zero.

This Skill solves one thing: give the agent a knowledge base that keeps accumulating.

It is not RAG (ad-hoc retrieval from a document pile every time) — it is compilation. After reading a source file, the agent writes the key information into existing wiki pages, comparing against old knowledge, merging, and flagging conflicts. Before executing any future task, it reads the wiki first and works from the accumulated base.

Four Modes

Mode	Trigger	What the agent does
recall	`recall` / `recall {topic}`	Load wiki context; all subsequent questions check the wiki before answering
ingest	User provides source files or text	Read source → search existing wiki → compare old vs new → update/create pages → update index
query	User asks a question (one-shot)	Read hub page → find relevant pages → synthesize an answer → archive valuable analysis
lint	User says "check the wiki"	Scan all pages → merge duplicates → archive stale pages → report contradictions and health
deep-dive	`deep-dive` / "go hard"	Run Coverage lint → show gap report → user confirms → search + ingest to fill gaps

deep-dive is not a fifth standalone mode — it is a combined pipeline of lint (Coverage) and ingest (with search tools). Requires search tools (active mode).

recall mode vs query: query is a one-shot operation (one question, one wiki lookup). recall mode is a persistent state — once entered, every question in the conversation goes through the wiki first.

recall Mode

Entering

Triggered when the user says /auto-wiki recall or /auto-wiki recall {topic}.

The agent executes:

Scan the wiki/ directory and list available wiki topics
If the user specified a topic → load that wiki; if not → list the available topics for the user to choose
Read the hub page ({Domain}.md — the main page in the wiki root named after the domain in English, matching meta.yaml name) → get the full page list and structure
Read the data.db summary → python references/store.py dump wiki/{topic}/ to get counts of data points, relations, contested items

Report to the user:

Entered recall mode: {topic}
- Pages: {N} (sources: X, entities: Y, concepts: Z)
- Data points: {N} | Relations: {N} | Contested: {N}
From now on I will check the wiki before answering. Say "exit recall mode" to return to normal.

Answering Flow

In recall mode, for each user question:

Extract keywords from the question (entity names, concept names, indicator names)
Match relevant pages in the hub page (title + description)

Query relevant data points in data.db:

SELECT * FROM data_points WHERE field LIKE '%keyword%' OR page_slug LIKE '%keyword%'

Read the matched wiki pages (typically 2-5), expanding one hop along wikilinks
Synthesize the answer, and it must:
- Cite specific pages: [[slug]]
- Cite specific data: value + unit + period + source
- Proactively flag any contested information involved
- If the wiki lacks the information, say explicitly "the wiki has no accumulation on this; suggest ingesting XX"
Never fabricate information the wiki does not contain. Better to say "I don't know" than to pretend the wiki has it

Exiting

Exit when the user says exit recall, switches to another operation (ingest/lint), or starts a new topic.

Execution Flow

Phase 0: Identify Target Domain & Ontology Type

On receiving user input, determine three things: operation type, target domain wiki, ontology type.

Wikis are organized by domain, not by research topic. A research topic (e.g. "the Fed's hiking cycle") is demoted to a single page under {domain}/analyses/ rather than getting its own wiki. First identify which domain the knowledge belongs to (macro / credit / ...), then land it in that domain's directory. See references/storage-spec.md.

User input	Operation	Target domain wiki	Ontology type
"Organize this monetary-policy research report for me" + file	ingest	macro	domain
"Ingest into macro" + file	ingest	macro	domain
"Research Charlie Munger" + material	ingest	charlie-munger	cognitive
"How much room is left for PBoC reserve-requirement cuts?"	query	macro	—
"Check the macro wiki"	lint	macro	—

The ontology type determines the wiki's node types and collection strategy:

Ontology type	Research object	Node types	Authoritative contract / reference
domain	A domain (institutions, instruments, indicators, mechanisms, events)	Entities (institutions/instruments/indicators) · Concept-mechanisms · Events · Analyses · Sources	Each domain's `wiki/{domain}/_ontology.md` + `references/ontology-types/domain.md`
cognitive	A person (mental models, decision-making style)	mental-model · Concepts · Sources · Analyses	`references/ontology-types/cognitive.md`

Each domain wiki's ontology is defined by its own wiki/{domain}/_ontology.md contract (node types, controlled relation vocabulary, six-tier time model, retirement protocol); read it before ingest/recall. A domain uses one ontology type only — never mix cognitive and domain structures within the same domain.

If the wiki directory does not exist, first confirm the creation location with the user (default wiki/{topic}/ under the current repository root), then create the initial structure per references/storage-spec.md (including meta.yaml, hub page template, log.md template). wiki/ goes under git version control and must not be added to .gitignore — it has to appear in the Obsidian graph (do not use a .wiki/ dot directory; Obsidian hides dotfolders).

Domain seed: if the target domain has a matching seed file (seeds/{name}.md), declare seed: {name} in meta.yaml. The seed provides a standard term vocabulary, relation templates, and no-confusion rules, so the wiki grows from a normalized starting point. Domains without a seed grow freely — both paths work. Seeds are community-contributable plugins: anyone can write a markdown file for their own vertical. See references/seed-ontologies.md.

On first use, run the environment check (see references/source-validation.md) and inform the user of currently available capabilities (passive mode vs active mode).

Reference Loading Strategy

Do not read all references at once. Load on demand by operation type:

Operation	Must read	Read on first time	Read when tools available
ingest	`ingest-protocol.md`, `wiki-format.md`, `schema.py`	`storage-spec.md` (when wiki doesn't exist), `seed-ontologies.md` + `seeds/{name}.md` (when meta.yaml declares a seed)	`fact-check.md`, `source-validation.md`
query	`query-protocol.md`	—	—
lint	`lint-protocol.md`, `schema.py`	—	`validators/{name}.md` (when the seed declares a validator)
deep-dive	`lint-protocol.md`, `ingest-protocol.md`, `source-validation.md`, `wiki-format.md`, `schema.py`	`storage-spec.md` (when wiki doesn't exist)	`fact-check.md`

Not needed: scaling.md is only relevant when page count > 500; ontology-types/ only when creating a new wiki and deciding its type.

Phase 1: Ingest (Knowledge Compilation)

This is the core operation. Detailed protocol in references/ingest-protocol.md.

Brief flow:

Read the source file and extract key information
Validate key data (if tools are available) — see references/fact-check.md
Write the source summary page (sources/{date}-{slug}.md)
Search the wiki for existing related pages (read the hub page, grep key entity names)
Compare new vs old information page by page:
- New info supports an existing conclusion → add citation, raise confidence
- New info overturns an existing conclusion → write the value into data.db (old value auto-enters the history table), rewrite the body analysis
- New info contradicts and cannot be adjudicated → present both claims side by side, confidence → contested
Create new pages (only for entities/concepts the wiki does not yet have)
Update the hub page + append to log.md
Schema validation — run python references/schema.py {page.md} on every page created/modified in this pass, ensuring frontmatter conforms. Fix immediately on failure before continuing

After ingest, report to the user:

Ingested into the {topic} wiki:
- Created: {N} pages (list them)
- Updated: {N} pages (list + brief reason for each change)
- Conflicts: {N} (list the contradictions)
- Validation: all {N} pages passed / {M} pages have issues (list them)

Phase 2: Query (Knowledge Lookup)

Detailed protocol in references/query-protocol.md.

Read the hub page and identify pages relevant to the question
Read the matched pages + expand one hop along wikilinks

Synthesize an answer from page content, citing source pages:

Based on 5 source files accumulated in the wiki:
... analysis ...
Sources: [[alpha-corp]], [[2026-policy-doc]]

If contested information is involved, flag the contradiction explicitly
If the answer contains valuable new analysis, suggest archiving it

If the wiki lacks enough information to answer, state the gap explicitly:

The wiki has insufficient information on XX; currently only 2 related source files.
Suggest ingesting more material about XX.

Phase 3: Lint (Knowledge Governance)

Detailed protocol in references/lint-protocol.md (7 checks + health report format).

Lint has two tiers:

Tier	Trigger	Checks	Cost
Structural (default)	`lint` / "check the wiki"	Validation, Orphan, Broken Link, Staleness	Full scan, deterministic
Semantic (on demand)	"deep lint" / "check contradictions"	Contradiction, Duplication, Coverage	Agent semantic understanding, scope-controlled

Structural tier: automatically scan all pages; fix formatting, broken links, orphan pages, staleness flags
Semantic tier (user-triggered): detect contradictions, duplication, coverage gaps. Wiki < 50 pages → full scan; 50-200 pages → only pages touched by ingests in the last 30 days; > 200 pages → user must specify scope
Report health:

Wiki health report: {topic}
- Total pages: 42 (entities: 15, concepts: 10, sources: 12, analyses: 5)
- Health: good
- Structural fixes: fixed 1 broken link, archived 1 stale page
- [Semantic] Pending human confirmation: 2 contradictions (listed)
- Suggestion: domain XX has few source files (only 1); consider adding more

Phase 4: Deep-Dive (Knowledge Completion Pipeline)

deep-dive = lint (Coverage) + ingest (search-fill). Not a standalone mode — a combined pipeline.

Prerequisite: requires search tools (active mode). Without search tools, only output the gap report; do not auto-fill. Prompt the user to ingest manually.

Flow:

1. Run the lint Coverage check (5 gap-detection categories)
   → Output a structured Gap Report (see lint-protocol.md)

2. Show the Gap Report and ask the user to confirm
   → The user may: accept all / pick a subset / narrow the scope / cancel
   → This step cannot be skipped — it prevents unsupervised bulk writes

3. For each confirmed gap, run the from-lint ingest flow
   → Search → user confirms sources → standard ingest
   → See the From-Lint section of ingest-protocol.md

4. Output the completion report: filled / unfilled / suggestions

Trigger words: deep-dive, "deep dive", "fill the gaps", "hunt for gaps", "go hard"

Example:

User: deep-dive treasury-futures
Agent: [Running Coverage lint...]
Agent: Found 6 knowledge gaps:
       1. [high] page_missing: stock-bond-correlation (referenced by 4 pages)
       2. [high] concept_missing: basis (mentioned in 5 entity pages)
       3. [medium] single_source: treasury-futures-basics (only 1 source)
       ...
       Which should I fill? (all / pick numbers / cancel)

User: 1, 2

Agent: [Searching "stock-bond correlation treasury futures"...]
Agent: Found 2 candidate sources:
       - [authoritative-secondary] a major broker's fixed-income report "Stock-Bond Correlation Analysis" ← recommended
       - [secondary] a WeChat blog post ← skip (blacklisted channel)
       Confirm using the broker report?

User: Confirm

Agent: [Running standard ingest → created concepts/stock-bond-correlation.md]
Agent: Completion done. Created 2 pages, updated 0, 1 gap unfilled (suggest providing material manually).

Wiki Page Format

See references/wiki-format.md. In brief:

Every page is markdown with frontmatter (title, type, created, updated, sources, confidence; entities additionally have subtype/aliases, mechanisms have durability, events have event_date)
domain node types: entities (subtype institution/instrument/indicator) · concept-mechanisms · events · analyses · sources (cognitive wikis additionally have mental-model)
Numeric values are never nodes (they go into data.db); classification tags are edges, not pages; relations use the controlled relation vocabulary
Use [[slug]] for inter-page links (slug = filename = wikilink = data.db primary key, English slugs)
The hub page is named after the domain name in English (e.g. Macro.md) — it is the directory page + graph hub; log.md is the operations log

Ontology Type Reference

When the research object is a domain, the authority is that domain's wiki/{domain}/_ontology.md contract; collection strategy in references/ontology-types/domain.md — nodes emphasize institution/instrument/indicator entities, mechanisms, events, and quantitative indicators (values go into data.db).

When the research object is a person, see references/ontology-types/cognitive.md — nodes emphasize mental models, heuristics, value systems, expressive style.

Both share the same wiki infrastructure (ingest/query/lint, data.db bitemporal tables, retire-never-delete); they differ only in node taxonomy and collection emphasis.

Vertical Domain Adaptation

The Skill core is a domain-agnostic compilation engine. Vertical-domain expertise is injected through two plugin layers:

Layer	Carrier	Role	Required?
Seed	`seeds/{name}.md`	Cold-start vocabulary: standard terms, relation templates, no-confusion rules	Optional
Validator	`validators/{name}.md`	Runtime logic validation: relation legality, completeness of required relations	Optional

Without plugins, the wiki grows freely — good for exploratory research. With plugins, the wiki starts from industry standards: normalized concept naming, clear relation structure, detectable logic gaps.

Community-contributable: write a seed file (markdown) for your vertical, declaring 20-50 core terms plus no-confusion rules, and wikis in that domain will grow from a normalized starting point.

Currently available:

seeds/fibo-pensions.md — enterprise annuities / pensions (based on the FIBO standard)
validators/fibo-mcp.md — FIBO SPARQL logic validation (627K inferred triples)

What This Skill Does Not Do

No vector retrieval. Small scale relies on the hub page + grep; large scale relies on SQLite FTS5 + BM25 (see references/scaling.md). Vector retrieval is left to platform-level tools.
No multi-user collaboration. The wiki directory is local files — one user, one wiki.
No replacement for professional data tools. Use the corresponding MCP/tools for domain data; this Skill only catches their output and compiles it into the wiki.

Relationship to Other Tools

This Skill replaces no professional tool — it chains them:

Any research tool produces analysis → ingest into the matching wiki
Any data tool pulls data            → ingest into the matching wiki
Domain seeds provide a starting line → standard terms + no-confusion rules
External validators correct logic    → lint checks knowledge-structure integrity

Before the next task, the agent reads the relevant wiki first → works with accumulated knowledge