name: scholar-workflow description: Use when the user asks you to search, retrieve, or reason about scholarly sources — papers, patents, books, or standards — guides tool selection, chaining, and cross-domain workflows.
Working effectively with scholarly sources
When the user asks about academic literature, prior art, citations, patents, books, or standards, use the scholar-mcp tools as described below. The server covers four source domains: papers, patents, books, and standards. Each domain has its own tools, but they cross-reference naturally — a paper may cite a patent, a patent's NPL section may reference a paper, and a paper with an ISBN is automatically enriched with book metadata.
Choosing the right search entry point
| User intent | Tool | Key parameters |
|---|---|---|
| Find papers on a topic | search_papers |
query, year_start/year_end, fields_of_study, venue, min_citations, sort |
| Look up a specific paper | get_paper |
identifier — DOI, S2 ID, ARXIV:id, ACM:id, PMID:id |
| Find an author's work | get_author |
Numeric S2 author ID for profile, or name string for disambiguation |
| Find patents | search_patents |
query, cpc_classification, applicant, inventor, jurisdiction, date filters |
| Look up a specific patent | get_patent |
patent_number, sections (biblio, claims, description, family, legal, citations) |
| Find books | search_books |
Prefer title/author over query — dedicated indexes give better recall |
| Look up a book by ISBN | get_book |
identifier — ISBN-10, ISBN-13, OL work ID, or edition ID |
| Find a standard | search_standards |
query, optional body filter (NIST, IETF, W3C, ETSI) |
| Normalise a messy citation | resolve_standard_identifier |
raw — e.g. "rfc9000", "nist 800-53", "wcag 2.2" |
| Resolve a mixed list of IDs | batch_resolve |
identifiers — mix of DOIs, S2 IDs, patent numbers, ISBN: prefixed ISBNs |
Domain-sticky traversal: a paper's references mostly point to other papers; a patent's family members are other patents; a standard often cites other standards. Follow the domain naturally — don't switch tools mid-chain unless the data leads you there (e.g. a patent's NPL citations resolve to papers).
Literature search strategies
Broad survey
search_paperswith a broad query. Usesort="citations"andmin_citationsto surface influential work quickly.- Pick the most relevant hit and call
get_citationsto see who built on it. - Call
get_referenceson the same paper to see what it builds on. - If the topic spans disciplines, use
fields_of_studyto split searches (e.g. search once for "Computer Science", once for "Medicine").
Targeted lookup
- When the user gives a DOI, arXiv ID, or other identifier, go straight to
get_paper. Don't search first. - For author-centric queries,
get_authorwith a name returns up to 5 disambiguation candidates. Ask the user to pick before fetching publications — don't guess.
Patent prior art
search_patentswith keywords, CPC codes, or applicant names.get_patentwithsections=["biblio", "citations"]— the citations section includes NPL references automatically resolved to S2 papers.get_citing_patentsfinds patents that cite a given paper (coverage is incomplete — EPO OPS does not capture all citations).
Standards lookup
resolve_standard_identifierhandles messy input ("rfc 9000", "NIST SP 800-53 rev5") and normalises to canonical form. Use it first when the user gives an informal name.get_standardwithfetch_full_text=truedownloads and converts the standard via docling (requiresSCHOLAR_MCP_DOCLING_URL). This may block until conversion completes; it only returns a task ID if rate-limited.
Citation graph traversal
Forward and backward citations
get_citations— papers that cite a given paper (forward). Supports year, field, and minimum citation count filters.get_references— papers cited by a given paper (backward).- Both return paginated results. Use
offsetfor additional pages.
BFS graph expansion
get_citation_graph does breadth-first expansion from 1–10 seed papers.
depth1 is usually sufficient for a neighbourhood view. Depth 2–3 expands rapidly — always setmax_nodesto cap growth.direction="both"finds the densest cluster but doubles API calls per hop. Use"citations"or"references"when the user has a clear direction.- Combine with
year_start/year_endandmin_citationsto keep results focused. When filters are active, the tool fetches more candidates per node to compensate for filtering losses.
Bridge papers
find_bridge_papers finds the shortest citation path between two papers.
Useful for connecting seemingly unrelated work. Searches up to max_depth
hops (default 4). Use direction="both" for best coverage.
Recommendations
recommend_papers takes 1–5 positive paper IDs and optional negative IDs.
The positive examples define the topic; negatives steer away from unwanted
areas. For best results, pick positive examples that span the desired topic
rather than clustering around one sub-area.
recommend_books takes a subject string (e.g. "machine learning", "algorithms")
and returns popular books from Open Library sorted by edition count.
Citation generation
generate_citations produces BibTeX, CSL-JSON, or RIS for up to 100 papers.
enrich=true(the default) adds OpenAlex venue metadata for more complete BibTeX entries (journal names, volumes, pages).- Unresolved paper IDs are included in the output with a "not found" note rather than silently dropped.
Enrichment
- Book enrichment is automatic: any paper with an ISBN in
externalIdsgets abook_metadatafield (publisher, edition, cover URL, subjects) from Open Library. No extra tool call needed. enrich_paperadds OpenAlex fields (affiliations, funders, OA status, concepts) on demand. Useful when the user needs institutional or funding context beyond what S2 provides.
Full-text retrieval and PDF conversion
PDF tools are write-tagged and hidden in read-only mode (the default). The
user must set SCHOLAR_MCP_READ_ONLY=false and have a running docling-serve
instance (SCHOLAR_MCP_DOCLING_URL) for Markdown conversion.
Papers
fetch_paper_pdfdownloads the PDF with automatic fallback: S2 open-access → ArXiv → PubMed Central → Unpaywall.fetch_and_convertdoes fetch + convert in one call — usually what users want.fetch_pdf_by_urlhandles arbitrary PDF URLs (e.g. an author's website).convert_pdf_to_markdownconverts a local PDF file to Markdown — use this when you already have the PDF on disk (e.g. a paywalled paper obtained manually). Accepts an absolutefile_path.
Patents
fetch_patent_pdfdownloads via authenticated EPO OPS. Not all patents have PDFs available (older patents, some WO publications).
Standards
get_standardwithfetch_full_text=truefetches and converts in one call.
VLM enrichment
Pass use_vlm=true to fetch_and_convert, fetch_pdf_by_url,
fetch_patent_pdf, or convert_pdf_to_markdown for better formula and figure
extraction. (get_standard uses docling internally but does not expose the VLM
flag.) Requires SCHOLAR_MCP_VLM_API_URL and SCHOLAR_MCP_VLM_API_KEY.
VLM and standard conversions are cached separately — switching modes never
overwrites previous results.
Handling queued operations
PDF downloads (fetch_paper_pdf), patent PDF fetches (fetch_patent_pdf),
full pipeline runs (fetch_and_convert), URL-based fetches (fetch_pdf_by_url),
local file conversions (convert_pdf_to_markdown), and any tool that hits a
rate limit return a task ID when work is submitted to the background queue.
Cache hits return immediately without queuing. Example queued response:
{"queued": true, "task_id": "abc123", "tool": "..."}
Poll with get_task_result(task_id="abc123"). The response includes
status, elapsed_seconds, and a hint while running.
list_tasks shows all active tasks. Don't poll in a tight loop — tasks
typically complete in 10–30 seconds (PDF download) or 1–5 minutes (PDF
conversion with VLM).
Do not
- Do not call
search_paperswhen the user gives a specific identifier — useget_paperdirectly. - Do not call
get_authorwith a name and then silently pick the first candidate — ask the user to disambiguate. - Do not set
depth> 2 inget_citation_graphwithout also setting a tightmax_nodescap — the graph expands exponentially. - Do not use
fetch_pdf_by_urlfor EPO OPS patent URLs — they require authentication. Usefetch_patent_pdfinstead. - Do not forget
batch_resolvewhen the user gives a mixed list of identifiers — it routes each to the correct backend automatically.