name: wiki-fetch-readwise-document description: Fetch one or more Readwise Reader documents into the wiki's raw/ without loading bodies into context. Streams content to disk via a jq pipe, then chains into wiki-ingest. Called by wiki-import-readwise; can also be used directly.
Fetch Readwise document
Grab a Reader document (or several) and drop it into raw/ without ever loading the full body into context. Only small metadata (title, author, url, date, category, doc_id) belongs in context. The body streams from the CLI to a file via a pipe.
Preconditions
readwiseCLI installed and authenticated (seewiki-import-readwiseStep 0).jqinstalled (brew install jqif missing).- A
raw/directory exists in the wiki hub.
Core pattern (single doc)
Given a Readwise URL (https://read.readwise.io/read/<id>), a bare doc_id, or a search query:
Step 1 — Resolve doc_id
If the user gave a URL, the id is the last path segment — no CLI call needed.
For a specific title, use --title-search (--query is also required even when filtering by title):
readwise reader-search-documents --query "<title words>" --title-search "<title words>" --limit 5 --json \
| jq -r '.[] | "\(.document_id)\t\(.title)\t\(.author)\t\(.category)"'
For a topical search, use --query alone; add --author-search when you know the author. Show candidates if ambiguous.
Step 2 — Fetch metadata (mandatory)
reader-get-document-details does not return image_url, source_url, published_date, word_count, or site_name. Pull those from reader-list-documents:
readwise reader-list-documents --id <DOC_ID> \
--response-fields title,author,url,source_url,category,published_date,saved_at,site_name,word_count,image_url \
--json | jq '.results[0]'
image_url— cover image. Embed asin the raw header.source_url— the original URL (not theread.readwise.ioshell). Use as the canonical**Source:**.
If image_url is null, skip the image embed — don't fail the fetch.
Filename slug: <author-last-or-source>_<short-title-slug>.md, lowercase, hyphen-separated, no punctuation, max 60 chars.
Step 3 — Stream the body to disk
The critical command. Never run reader-get-document-details without piping into jq and redirecting to a file:
{
printf '# %s\n\n\n\n**Source:** %s\n**Readwise URL:** https://read.readwise.io/read/%s\n**Readwise ID:** %s\n**Date:** %s\n**Author:** %s\n**Category:** %s\n\n---\n\n' \
"<TITLE>" "<IMAGE_URL>" "<SOURCE_URL>" "<DOC_ID>" "<DOC_ID>" "<DATE>" "<AUTHOR>" "<CATEGORY>"
readwise reader-get-document-details --document-id <DOC_ID> --json | jq -r '.content'
} > raw/<slug>.md
Drop the \n\n line if image_url is null.
Step 4 — Verify without reading the body
wc -l raw/<slug>.md && head -n 10 raw/<slug>.md
head -n 10 shows only the header you wrote. Line count 0 means something went wrong.
Step 5 — Report, then chain into ingest
Tell the user the filename and word count (from metadata), and that the body is on disk. Do not summarize content — you haven't read it. Then invoke wiki-ingest on the raw file.
Multi-doc pattern
Resolve all doc_ids (Step 1), fetch metadata for all (Step 2), then loop:
for id in <ID1> <ID2> <ID3>; do
slug=$(...) # derived per-id from metadata
{
printf '...header...'
readwise reader-get-document-details --document-id "$id" --json | jq -r '.content'
} > "raw/$slug.md"
done
wc -l raw/*.md
Hold off on ingest until all fetches finish, then ingest the batch (parallel subagents if available — see wiki-import-readwise Step 4).
JSON shapes (don't re-probe these)
reader-search-documents --json→ top-level array. Items:document_id,title,author,category,url,matches[].reader-list-documents --json→{count, nextPageCursor, results: [...]}. Access withjq -r '.results[0] | ...'.reader-get-document-details --json→ flat object:id, title, author, category, tags, notes, content. Body is.content. Noimage_url/source_url/published_date.
CLI flag reference (don't guess)
readwise reader-get-document-details --document-id <DOC_ID> --json # NOT --id
readwise reader-list-documents --id <DOC_ID> --json # NOT --document-id
readwise reader-search-documents --query "<text>" --json
readwise reader-search-documents --query "<text>" --title-search "<title>" --json
readwise readwise-search-highlights --vector-search-term "<text>" --limit 30 --json
Tweet caveat
When a saved tweet is a reply, Reader stores the parent thread as the document. source_url points at the actually-saved tweet; image_url is the parent author's avatar. Surface this when fetching tweet replies.
Rules
- Never run
reader-get-document-detailswithout| jq -r '.content' > <file>. - Never re-open or
cataraw/file you just wrote unless the user asks. - Never probe JSON shapes with
jq 'keys'— they're documented above. - Use the flag reference over
--help;--helpis a fine fallback. - Prefer
--title-searchover--querywhen the user names a specific title. - Missing metadata fields → use
null/unknown; do not fetch the body to find them. - Confirm before overwriting an existing
raw/file.