name: kb-import description: | Import knowledge from existing documents into structured KB entries. Reads source documents (Markdown, PDF, DOCX, plain text), extracts key information, and creates properly formatted KB entries with YAML frontmatter.
KB Import Workflow
Import knowledge from existing documents into your knowledge base.
When to Use
- Adding knowledge from existing documentation
- Converting unstructured docs into structured KB entries
- Bulk-importing content into a new KB
Modes
- Single-document mode (default): one source document is split into one or more KB entries. Use Steps 1 to 6 below.
- Bulk mode: many source documents are ingested at once from a directory or a list of files. Use when the user points at a folder or provides a list longer than ~3 files. See Bulk Mode at the bottom.
Step 1: Understand the KB Structure
Read the KB config to understand available categories:
kb/.kb-config.yaml
Read the index to see what already exists:
kb/index.md
Step 2: Read the Source Document
Read the source file provided by the user. Supported formats:
- Markdown (.md)
- PDF (.pdf, use the Read tool with page ranges for large files)
- Plain text (.txt)
Step 3: Plan the Extraction
Analyze the document and propose a plan to the user:
- How many KB entries should be created?
- What categories do they belong to?
- Suggested titles for each entry
Present this as a table:
| # | Title | Category | Source Section |
|---|-------|----------|---------------|
| 1 | ... | ... | ... |
Wait for user confirmation before proceeding.
Step 4: Create KB Entries
For each planned entry, create a markdown file with YAML frontmatter:
---
title: "Entry Title"
description: "Brief one-liner for index lookup"
category: {category}
tags: [{tag1}, {tag2}]
sources: ["{source_filename}"]
last_updated: "{today's date}"
related:
- {category}/{related-file}.md
---
## Section Title
Content here. Write clear, quotable statements.
Each fact should be a self-contained sentence that can be cited as evidence.
Content Guidelines
- Preserve specifics: Keep exact numbers, dates, names, versions. Keep concrete customer/product examples by name (e.g., "Acme Corp", "Globex") — they make abstract concepts tangible and shouldn't be stripped "for neutrality".
- One topic per entry: Don't create catch-all files
- Quotable statements: Write so that individual sentences can be cited as evidence
- Capture the easily-missed content types when the source covers them: stakeholders (one entry per key person with role + ownership + contact pattern), projects (goal/owner/status), repositories (purpose/ownership). These are the most commonly skipped in first-pass imports.
- No opinions or speculation: Only include facts from the source document
- Use markdown structure: Headers, bullet points, tables for structured data
File Naming
- Use lowercase with hyphens:
data-encryption.md,product-overview.md - Name should reflect the topic, not the source document
Step 5: Update the Index and Validate
After creating entries, regenerate the index and validate:
python3 scripts/kb-index.py --write # rewrite kb/index.md's "All Files by Category"
python3 scripts/kb-validate.py # check frontmatter, categories, related links
Review the stdout output to verify all new entries appear correctly. Resolve any validate errors before continuing.
Step 6: Summary
Report to the user:
- How many entries were created
- Which categories they were placed in
- Any information from the source document that was skipped (and why)
- Suggestion to review entries and add
related:links between them
Bulk Mode
Use this when the user wants to ingest many documents in one go (e.g., "import everything in ~/docs/policies/", or a list of 5+ files).
Bulk Step 1: Enumerate the source set
- If the user provided a directory, list supported files in it recursively (
.md,.pdf,.txt,.docx). Skip obvious noise (.DS_Store,node_modules, hidden files). - If the user provided a list of paths, use exactly those.
- Present the file count and a sample (first 10) to the user. Confirm before reading anything heavy.
Bulk Step 2: Plan across the whole batch
Read the frontmatter / first page of each file to get a title guess. Produce a single combined plan:
| # | Source file | Proposed KB entry | Category |
|---|-------------|-------------------|----------|
| 1 | policies/acceptable-use.pdf | security/acceptable-use.md | security |
| 2 | policies/retention.pdf | security/data-retention.md | security |
| ...
Rules:
- One KB entry per source file by default. Split a source into multiple entries only when it clearly covers multiple distinct topics.
- Prefer nested categories (e.g.,
security/access) when the batch is large enough that a flat category would become unwieldy (> ~10 entries in one category). - Flag duplicates up front: if a planned entry already exists in the KB, mark it "UPDATE" instead of "CREATE".
Wait for user confirmation on the full plan before proceeding.
Bulk Step 3: Process in parallel
- For ≤ 5 files, process sequentially (easier to follow, fewer context switches).
- For > 5 files, dispatch a subagent per file (or per small group of related files) with the import instructions, the target path from the plan, and the existing KB index as context. Collect results.
- If any subagent fails, keep the successful entries and report the failures so the user can retry a smaller batch.
Bulk Step 4: Finalize
After all files are processed:
python3 scripts/kb-index.py --write
python3 scripts/kb-validate.py
python3 scripts/kb-search.py "sanity-check-term" # spot-check a term that should appear
Report: X created, Y updated, Z skipped (with reason per skip). Flag any validate warnings or errors.