name: paperbridge
description: Use when a task involves Zotero (search, collections, items, PDFs, full-text), DOI/Crossref resolution, searching external paper sources (arXiv, HuggingFace Papers, Semantic Scholar, OpenAlex, etc.), or retrieving locally cached papers from the Paperseed corpus. Provides both a CLI (paperbridge ...) and an MCP server (paperbridge serve). Prefer MCP tools when available; fall back to CLI invocation otherwise.
paperbridge
Rust CLI + MCP server bridging Zotero (cloud or desktop local API), external paper indexes, and a local Paperseed cache. Use it for literature search, reference resolution, structured paper parsing, and preparing paper content for downstream agents.
MCP availability. When connected to
paperbridge serve, this guide is also served as the promptpaperbridge_skill(prompts/getwithname: "paperbridge_skill").
When to use
- Search a Zotero library or browse collections, tags, attachments.
- Resolve a DOI to structured metadata (title, authors, year, journal, abstract).
- Search external paper indexes: arXiv, Crossref, OpenAlex, Europe PMC, DBLP, OpenReview, PubMed, HuggingFace Papers, Semantic Scholar, CORE, NASA ADS, ScholarAPI.
- Retrieve full-text or structured content from a Zotero attachment or a cached paper.
- Validate, create, update, or delete Zotero items and collections.
- Import or query the local Paperseed corpus (
paperseed_enabled = true).
Modes
- MCP (preferred in agent contexts): use the registered
paperbridgeMCP server tools directly — they mirror the CLI commands below. - CLI:
paperbridge <domain> <action>. All data commands print JSON on stdout; errors go to stderr. Pipe through| jqfor inspection.
First-time setup
paperbridge config init --interactive
paperbridge config validate
paperbridge status
Backend modes: cloud (api.zotero.org, needs api_key + user_id),
local (Zotero Desktop at http://127.0.0.1:23119, no key), hybrid
(local reads, cloud writes).
Core recipes
Search — library, external, and cached
# Zotero library
paperbridge library query -q "diffusion models" --limit 10
# External papers + local cache (cached results prioritized first)
paperbridge papers search -q "intrusion detection" --limit 3 --max-results 10
paperbridge papers search -q "attention is all you need" --sources arxiv,semantic_scholar
# Paginated (agents should page through large result sets)
paperbridge papers search -q "transformers" --max-results 5 --offset 10
Results are deduplicated by DOI → arXiv ID → PMID → normalized
title+first-author. Cached papers appear with source: "paperseed" and a
cache.cached annotation. All cached hits are sorted ahead of external
results.
MCP tool: search_papers { query, limit_per_source?, sources?, offset?, limit? }.
Returns { query, total_count, offset, limit, hits: [...] }. Use offset and
limit to page through large result sets.
Available source values: arxiv, paperseed (local cache), crossref,
openalex (oa), europe_pmc (epmc), dblp, openreview (or),
pubmed (pm), hugging_face (hf), semantic_scholar (s2), core,
ads (nasa_ads), scholarapi (scholar).
Always-on (no key): arXiv, Crossref, OpenAlex, Europe PMC, DBLP, OpenReview, PubMed. Key-gated (silent skip when unset): HuggingFace, Semantic Scholar, CORE, NASA ADS, ScholarAPI.
Resolve a DOI
paperbridge papers resolve-doi --doi "10.1038/nature12373"
When unpaywall_email is configured, the response includes oa_pdf_url.
Read full-text — Zotero or cached paper
# Zotero attachment
paperbridge library read --item-key ABCD1234
paperbridge library read --item-key ABCD1234 --attachment-key PDF5678
# Search then read (picks best attachment)
paperbridge library read-search -q "sparse attention" --result-index 0 --search-limit 5
Cache fallback: get_pdf_text and get_item_fulltext automatically
search the local Paperseed cache when Zotero is unreachable. Pass a title, DOI,
or paper ID as the key — the route treats it as a natural-language query
against cached papers. If a match is found with extracted fulltext, it is
returned directly.
MCP tools:
get_pdf_text { attachment_key }— Zotero attachment or cache queryget_item_fulltext { attachment_key }— same fallback behaviorprepare_vox_text { text?, attachment_key?, max_chars_per_chunk? }— chunks for Voxprepare_item_for_vox { item_key, attachment_key?, max_chars_per_chunk? }— prefers cached papersprepare_search_result_for_vox { q, result_index?, ... }— search → cached-paper check → Zotero fallback
Structured paper content
Returns a typed JSON structure with sections, references, and figures. Works with both Zotero items and cached paper IDs.
paperbridge papers structure --key ABCD1234
paperbridge papers query --key ABCD1234 --selector "sections[0].text"
paperbridge papers query --key ABCD1234 --selector "metadata.doi"
MCP tools: get_paper_structure { item_key, attachment_key? }, query_paper { item_key, selector, attachment_key? }. Both accept Zotero keys or cached
paper IDs. When a cached paper has no extracted fulltext, metadata is still
returned with empty sections (no 404s).
Selectors use dotted paths with bracket indexing (sections[2].text,
references[0].title). The source field tells you the provenance:
grobid, zotero_fulltext, or grobid_unavailable.
Local Paperseed corpus
Manage the content-addressed local cache and license-gated seed manifests:
paperbridge paperseed corpus status
paperbridge paperseed corpus import ./paper.pdf --license cc-by
paperbridge paperseed corpus ingest --metadata item.json --file paper.pdf --license cc-by
paperbridge paperseed corpus query -q "induction heads"
paperbridge paperseed corpus export --format bibtex
paperbridge paperseed seed check --paper-id <id>
paperbridge paperseed seed create --paper-id <id>
Imported PDFs have their text automatically extracted and stored in the
corpus for full-text search. YAMS provides an experimental
storage/search backend when paperseed_yams_enabled = true.
Write Zotero items & collections
Write ops take a JSON file on disk. Cloud backend requires api_key with
write scope.
paperbridge item validate --file item.json --online
paperbridge item create --file item.json
paperbridge item update --file item.json
paperbridge item delete --file item.json
paperbridge collection create --name "ML 2025"
Run as MCP server
paperbridge serve
paperbridge config snippet --target claude
paperbridge config snippet --target opencode
Key config keys
| key | purpose |
|---|---|
backend_mode |
cloud, local, hybrid |
api_key |
Zotero API key — redacted in config get unless --show-secret |
user_id |
numeric Zotero user ID |
group_id |
numeric group ID (optional) |
library_type |
user or group |
paperseed_enabled |
enable local Paperseed corpus (default false) |
paperseed_auto_download |
automatically mirror OA PDFs into local corpus (default true) |
paperseed_yams_enabled |
use YAMS as experimental storage/search backend (default true) |
paperseed_corpus_root |
override corpus path |
hf_token, semantic_scholar_api_key, core_api_key, ads_api_token, scholarapi_key |
gate external sources |
ncbi_api_key |
optional PubMed rate-limit upgrade |
unpaywall_email |
enables OA-PDF enrichment |
grobid_url |
GROBID endpoint; if set, auto-spawn is disabled |
grobid_auto_spawn |
launch GROBID via Docker (default false) |
grobid_image |
Docker image for auto-spawn |
log_level |
error, warn, info, debug, trace |
paperbridge config get masks secrets by default. Pass --show-secret to reveal.
Gotchas
- Cloud api_base must be HTTPS (or
http://localhostfor local mode). - Search results are paginated — use
offset/limitto page through large sets. Thetotal_countfield tells you how many remain. - Cached papers are prioritized first in search results (regardless of
--sourcesfilter). Look forcache.cached: trueandsource: "paperseed". - PDF text extraction happens automatically during local corpus import — no separate step needed.
- Read output can be large — always set
--max-chars-per-chunkwhen feeding into an LLM. - Write operations need
versionon update/delete (Zotero optimistic concurrency). Re-fetch if you get HTTP 412. config get api_keyno longer prints the raw key — it prints(set, N chars — pass --show-secret to reveal).- Legacy flat commands (
query,create-item,backend-info,search-papers, …) still work but emit a deprecation warning. Prefer the canonical domain paths.
Verify install
paperbridge --version
paperbridge status
paperbridge config validate
Contributors
CLI surface changes must be reviewed against
docs/design/cli-design.md.