name: wiki-ingest description: Use when the user asks to ingest, import, or process one or more notes; mentions a raw note file path; provides a Confluence URL or page title; or says "ingest note", "ingest notes", "ingest new notes", or "ingest files". NOT for Slack messages — use wiki-fetch-slack instead.
Knowledge Base - Ingest
Session 1 — coordinator
When asked to "ingest new raw notes" (or similar):
- Convert raw files (run automatically before partitioning):
These move each VTT/EML/HTML source file into apython3 scripts/system/convert-vtt-to-md.py --input-dir raw/transcripts python3 scripts/system/convert-eml-to-md.py --input-dir raw/emails python3 scripts/system/convert-html-to-md.py --input-dir raw/emails_resources/subdirectory and write a companion.md(same stem) where the source used to live — containing frontmatter, an![[embed]]of the original, and the extracted text in a collapsed> [!ocr-extractor]- Extracted textcallout — so the Markdown is picked up by the batch importer. Skip silently if the input directories don't exist. - Partition (run automatically):
bash scripts/system/wiki-create-import-batches.sh- Default max batch size is 10 files. Override with
--max-files-per-batch N(e.g.--max-files-per-batch 20). - This removes any old
.import/batch-import-*.txtremnants and creates fresh ones. - The partitioner respects Markdown frontmatter
ingest: false(alsoingest:false, quotedfalse, and case variants). Protected notes are left out of batch files, as are localraw/files explicitly linked from those notes via wikilinks/embeds or Markdown links/images. The script prints the skipped note basenames and linked-file counts, but writes no skip entries towiki/log.jsonl. - If the script exits with code 3: there are no new notes to ingest. Report "Nothing to ingest" and stop.
- If the script exits with code 2: a previous ingest was not completed. Ask the user what to do (use
AskUserQuestionwhen available; otherwise ask a concise plain-text question), with these options:- "Ingest next batch" — stop here and tell the user: "Use
wiki-ingest-next-batch(or sayingest next batch) in a new session to continue."; do NOT re-runwiki-create-import-batches.sh. - "Abort previous ingestion and restart importing new notes" — re-run
bash scripts/system/wiki-create-import-batches.sh --forceto wipe old batches, then continue with this flow from step 3. - "Abort" — stop immediately and do nothing.
- "Ingest next batch" — stop here and tell the user: "Use
- Check the exit code explicitly after running the script:
bash scripts/system/wiki-create-import-batches.sh; echo "EXIT:$?"and look forEXIT:2orEXIT:3.
- Default max batch size is 10 files. Override with
- Check how many batches have content: count non-empty
.import/batch-import-*.txtfiles (the script prints the count).- If only 1 batch has content: process it (step 4) and immediately proceed to Finalization — say "Batch done. Say
finalize ingest(or/wiki-finalize-ingest) to wrap up." - If 2+ batches have content: Tell the user — "Batches ready. Open N more sessions and say
ingest next batchin each one. I'll start batch 1 now. When all sessions are done, come back here." — then proceed to step 4.
- If only 1 batch has content: process it (step 4) and immediately proceed to Finalization — say "Batch done. Say
- Process batch 1: first claim it atomically:
Then readmv .import/batch-import-1.txt .import/batch-import-1.claimed.txt.import/batch-import-1.claimed.txt. Dispatch sub-agents in batches of 10 when the current client supports and permits delegation; otherwise process the files in the current session in manageable chunks. Each sub-agent prompt must begin with: "Invokewiki-ingest-per-notebefore processing. Write session logs to.import/batch-log-1.jsonl. Then ingest these files: [list]." After all sub-agents finish, delete.import/batch-import-1.claimed.txt. - After all batch processing is complete (batch 1 done here + all batch agents done): run finalization with the
wiki-finalize-ingestskill, or dispatch onewiki-finalize-ingestagent when the current client supports and permits it. Report the batch summary to the user while finalization runs.
Freshness migration behavior
Ingest should now be treated as evidence collection plus candidate generation, not an unconditional broad rewrite of canonical pages:
- New notes may still create or update Wiki pages, but avoid page-wide regeneration.
- When a new note appears to affect an existing canonical page, prefer a minimal local edit or a review candidate.
- Durable supersession still requires explicit evidence and human confirmation. Do not infer
superseded_byjust because a newer note discusses the same topic. - After finalization, run the one-command freshness check to identify one-page curation candidates:
When usingscripts/wiki-freshness.sh --root .scripts/wiki-ingest.sh, this already happens automatically after doctor and QMD sync; just report its result. - If drift candidates are meaningful, suggest
wiki-curate-pagefor the highest-value page instead of re-ingesting the same raw notes.
Confluence ingestion
Triggered by a Confluence URL or page title:
- Fetch via the available Atlassian MCP tools (authenticate first if needed — use whichever fetch/search/page-retrieval tools are available). If no Atlassian connector is available, ask the user for the exported page content before writing files.
- Save to
raw/confluence/<Page Title>.mdwith frontmatter:
---
source_url: <url>
fetched: YYYY-MM-DD HH:mm:ss
last_modified: YYYY-MM-DD # the page's last-updated / version date from the fetch metadata
created: YYYY-MM-DD # the page's creation date, if the fetch exposes it
---
last_modifiedis the real content date (Confluence's "Last updated" / version date). Always record it when available — it gives the page an accurate, medium-confidencedateinstead of falling back to thefetchedsync timestamp.- Continue with per-note ingestion for that file (as a single-file session — write to
.import/batch-log-1.jsonl). - Assign freshness dates inline (don't wait for finalize): after all Wiki pages are written, run the date pass so the new raw page and every created/updated Wiki page get
date/date_span/date_confidenceimmediately. Run the full pass (idempotent, only rewrites pages whose dates changed — no need to enumerate paths):
(To date only specific files instead, appendpython3 scripts/system/wiki-assign-dates.py --apply--paths "raw/confluence/<Page>.md" "wiki/<topic>/<page>.md" ….) - Then tell the user to run
finalize ingestto rebuild indexes and re-index search.
Refresh: "refresh this Confluence page" → re-fetch, overwrite cache, diff vs previous, flag changes affecting existing Wiki pages.
Slack ingestion
For Slack channels and DMs, use the wiki-fetch-slack skill instead of this one.
Trigger phrases: "fetch Slack", "ingest Slack", "sync Slack channels".
wiki/log.jsonl format
wiki/log.jsonl is append-only. One JSON object per line, sorted oldest-to-newest by append order.
{"date":"YYYY-MM-DD HH:mm:ss","session":1,"file":"raw/notes/meeting-2026-03-01.md","summary":"Quarterly planning meeting notes.","pages_created":["wiki/decisions/adopt-vector-tiles.md","wiki/projects/AutoStream.md"],"pages_updated":["wiki/people/Jane Smith.md"]}
Finding un-ingested notes: jq -r '.file' wiki/log.jsonl — lists all ingested paths.
Fallback without jq: grep -oP '"file":"\K[^"]+' wiki/log.jsonl
Notes
- A single note may touch 5–25+ Wiki pages. That is expected and desirable.
- Never re-ingest a file already present in
wiki/log.jsonlwithout explicit user confirmation. - Parallel sessions writing to the same Wiki page is safe: the second session reads the already-updated page and extends it further.