name: wiki-fetch-readwise-highlights description: Mine Readwise highlights via vector search, group by parent document, and write highlight collections into the wiki's raw/ without loading them into context, then chain into wiki-ingest. Called by wiki-import-readwise; can also be used directly.
Fetch Readwise highlights
Pull relevance-filtered highlights from the user's Readwise library (books, articles, tweets, podcasts) into raw/ as per-parent-document markdown files, then hand them to wiki-ingest. The unit is a parent doc; the content is only the highlights that matched one or more queries.
Highlights are valuable because they're already user-curated and compact — a book with 80 highlights is ~3–6k tokens vs. the full book being unusable.
Preconditions
readwiseCLI installed and authenticated (seewiki-import-readwiseStep 0).jqinstalled.raw/directory exists.wiki/home.mdhas a clear through-line, or the user has told you the research frame.
Step 1 — Formulate vector search queries
Read wiki/home.md and scan wiki/index.md for open questions. Formulate 5–10 candidate queries covering: core topic terms, adjacent debate vocabulary, named entities, phenomena/mechanisms, contrarian framings.
Show the query list to the user before searching. Let them add/remove/adjust. Don't silently batch.
Step 2 — Run vector searches
For each query, redirect output to a temp file — do not pipe to stdout:
readwise readwise-search-highlights --vector-search-term "<query>" --limit 30 --json > /tmp/rwhl_query_<N>.json
Batch all queries in one shell call.
Step 3 — Merge, dedupe, group by parent document
jq -s '
[ .[] | .[] ]
| unique_by(.id)
| group_by(.attributes.document_title + "|" + .attributes.document_author)
| map({
title: .[0].attributes.document_title,
author: .[0].attributes.document_author,
category: .[0].attributes.document_category,
doc_tags: .[0].attributes.document_tags,
match_count: length,
top_score: (map(.score) | max),
highlights: [ .[] | { id, score, text: .attributes.highlight_plaintext, note: .attributes.highlight_note, tags: .attributes.highlight_tags }]
})
| sort_by(-.match_count, -.top_score)
' /tmp/rwhl_query_*.json > /tmp/rwhl_grouped.json
Report: number of unique parent docs, top 10 by match count. Ask the user to confirm the inclusion threshold (default: match_count >= 2, or match_count == 1 && top_score > 0.5). Let them prune off-topic results before writing files.
Step 4 — Write per-doc raw files
For each parent doc passing the threshold, write raw/<slug>_highlights.md (the _highlights suffix distinguishes these from full-document raws):
jq -c '.[] | select(.match_count >= 2 or (.match_count == 1 and .top_score > 0.5))' /tmp/rwhl_grouped.json \
| while IFS= read -r doc; do
slug=$(echo "$doc" | jq -r '(.author // "unknown" | ascii_downcase | gsub("[^a-z0-9]+"; "-") | .[0:20]) + "_" + (.title | ascii_downcase | gsub("[^a-z0-9]+"; "-") | .[0:35]) + "_highlights"')
{
echo "$doc" | jq -r '"# Highlights: " + .title + "\n\n**Author:** " + .author + "\n**Category:** " + .category + "\n**Document tags:** " + (.doc_tags | if length == 0 then "none" else join(", ") end) + "\n**Match count:** " + (.match_count|tostring) + "\n**Top score:** " + (.top_score|tostring) + "\n\n> Note: matched highlights only, not every highlight in the doc.\n\n---\n"'
echo "$doc" | jq -r '.highlights[] | "> " + (.text | gsub("\n"; "\n> ")) + "\n" + (if .note != "" and .note != null then "**Note:** " + .note + "\n" else "" end) + (if (.tags // [] | length) > 0 then "*Tags: " + (.tags | join(", ")) + "*\n" else "" end) + "\n---\n"'
} > "raw/$slug.md"
done
Step 5 — Verify without reading the bodies
ls -la raw/*_highlights.md && wc -l raw/*_highlights.md
Do not cat or re-open a highlights file unless the user asks.
Step 6 — Report, then chain into ingest
Report: parent docs landed, filenames, match counts, any docs dropped. Then invoke wiki-ingest on all new *_highlights.md raws. Tell the ingest step these are highlight collections — cite individual highlights rather than re-paraphrasing.
JSON shape (don't re-probe)
readwise-search-highlights --json→ top-level array. Items:id,score,attributes.{document_title, document_author, document_category, highlight_plaintext, highlight_note, highlight_tags}.
Rules
- Never send
readwise-search-highlightsoutput to stdout — always redirect to/tmp/. - Never write JSON to
raw/— raw files are markdown only; temp JSON goes in/tmp/. - Never re-open a
raw/file you just wrote unless the user asks. - Always confirm the query set with the user before searching.
- Always chain into
wiki-ingestunless the user said "just fetch." - A 30-doc haul is probably too many; 5–15 is a normal batch.