paper-scout - SKILL.md Agent Skill

name: paper-scout description: Fetch arXiv papers by date range and topics, rank them for research value, and produce introduction digests. Use when the user wants a literature sweep, daily or weekly paper triage, or a written overview of the best papers in a niche — without relying on the arxivterminal local database. metadata: {"clawphd":{"emoji":"🔭"}}

Paper Scout (arXiv pipeline)

Built-in tools implement a three-step workflow inspired by PaperFlow (arXiv query + metadata-style scoring) and PaperBrain (LLM screening of title/abstract + narrative digest).

Tools

Tool	Purpose
`arxiv_fetch_range`	Crawl submittedDate in `[start_date, end_date]`, optional keywords (OR in title/abstract), categories (default cs.AI, cs.LG, cs.CL, math.OC). Returns JSON `papers`.
`arxiv_rank_papers`	Enhanced metadata score every paper; optionally enriches the top pool with Semantic Scholar / OpenAlex signals and optionally runs one batch LLM call for shortlist scoring; returns `selected` (top N).
`arxiv_paper_digest`	Introduction report (Markdown) for the `selected` list; uses metadata, bibliometric cues, and LLM rationale when available.

Recommended workflow

Fetch
Call arxiv_fetch_range with start_date, end_date, keywords, and optionally categories / max_results.
Pass the returned JSON (whole object) into the next step.
Rank
Call arxiv_rank_papers with:
- papers_json: the string from step 1 (or json.dumps of the object).
- interest_keywords: what “high value” means for the user (used for scoring and LLM).
- top_n: how many papers to keep.
- use_external_ranking: defaults to true; uses Semantic Scholar first and OpenAlex fallback for stronger selection.
- use_llm_refinement: true for stronger selection (needs OpenRouter / VLM configured for ClawPhD).
Digest
Call arxiv_paper_digest with:
- selected_papers_json: json.dumps of the selected array from step 2, or the full rank output object.
- interest_keywords: same as step 2.
- language: zh or en.

Notes

No arxiv Python package is required; queries use the public Atom API via httpx.
External ranking uses Semantic Scholar first and OpenAlex as fallback. It works without a key, but if you have one you can pass semantic_scholar_api_key or set SEMANTIC_SCHOLAR_API_KEY / S2_API_KEY.
LLM refinement and rich digest need a configured multimodal/text provider (same stack as diagram tools — typically OpenRouter).
For local cached papers and interactive CLI workflows, still use the arxivterminal skill (arxiv fetch, scripts/arxiv_search.py). All papers are stored under paper_library/{subject}/{date}/ in the workspace root, where {subject} is a snake_case topic slug and {date} is YYYY-MM-DD.

Example (conceptual)

arxiv_fetch_range(start_date="2026-03-01", end_date="2026-03-10", keywords=["world model", "VLA"], categories=["cs.RO","cs.LG"])
→ arxiv_rank_papers(papers_json=<output>, interest_keywords=["world model","VLA","embodied AI"], top_n=5, use_external_ranking=true, use_llm_refinement=true)
→ arxiv_paper_digest(selected_papers_json=<selected>, interest_keywords=[...], language="zh")