name: paper-scout description: Fetch arXiv papers by date range and topics, rank them for research value, and produce introduction digests. Use when the user wants a literature sweep, daily or weekly paper triage, or a written overview of the best papers in a niche — without relying on the arxivterminal local database. metadata: {"clawphd":{"emoji":"🔭"}}
Paper Scout (arXiv pipeline)
Built-in tools implement a three-step workflow inspired by PaperFlow (arXiv query + metadata-style scoring) and PaperBrain (LLM screening of title/abstract + narrative digest).
Tools
| Tool | Purpose |
|---|---|
arxiv_fetch_range |
Crawl submittedDate in [start_date, end_date], optional keywords (OR in title/abstract), categories (default cs.AI, cs.LG, cs.CL, math.OC). Returns JSON papers. |
arxiv_rank_papers |
Enhanced metadata score every paper; optionally enriches the top pool with Semantic Scholar / OpenAlex signals and optionally runs one batch LLM call for shortlist scoring; returns selected (top N). |
arxiv_paper_digest |
Introduction report (Markdown) for the selected list; uses metadata, bibliometric cues, and LLM rationale when available. |
Recommended workflow
Fetch
Callarxiv_fetch_rangewithstart_date,end_date,keywords, and optionallycategories/max_results.
Pass the returned JSON (whole object) into the next step.Rank
Callarxiv_rank_paperswith:papers_json: the string from step 1 (orjson.dumpsof the object).interest_keywords: what “high value” means for the user (used for scoring and LLM).top_n: how many papers to keep.use_external_ranking: defaults totrue; uses Semantic Scholar first and OpenAlex fallback for stronger selection.use_llm_refinement:truefor stronger selection (needs OpenRouter / VLM configured for ClawPhD).
Digest
Callarxiv_paper_digestwith:selected_papers_json:json.dumpsof theselectedarray from step 2, or the full rank output object.interest_keywords: same as step 2.language:zhoren.
Notes
- No
arxivPython package is required; queries use the public Atom API viahttpx. - External ranking uses Semantic Scholar first and OpenAlex as fallback. It works without a key, but if you have one you can pass
semantic_scholar_api_keyor setSEMANTIC_SCHOLAR_API_KEY/S2_API_KEY. - LLM refinement and rich digest need a configured multimodal/text provider (same stack as diagram tools — typically OpenRouter).
- For local cached papers and interactive CLI workflows, still use the
arxivterminalskill (arxiv fetch,scripts/arxiv_search.py). All papers are stored underpaper_library/{subject}/{date}/in the workspace root, where{subject}is a snake_case topic slug and{date}isYYYY-MM-DD.
Example (conceptual)
arxiv_fetch_range(start_date="2026-03-01", end_date="2026-03-10", keywords=["world model", "VLA"], categories=["cs.RO","cs.LG"])
→ arxiv_rank_papers(papers_json=<output>, interest_keywords=["world model","VLA","embodied AI"], top_n=5, use_external_ranking=true, use_llm_refinement=true)
→ arxiv_paper_digest(selected_papers_json=<selected>, interest_keywords=[...], language="zh")