paper-fetcher-vp - SKILL.md Agent Skill

name: paper-fetcher-vp description: Use when the user provides a research paper screenshot, title, arXiv/OpenReview/publisher/project URL, or article excerpt and wants Codex to identify the paper, download and verify an official PDF, rename it with a research-field prefix, and report a Zotero Add Item by Identifier value such as arXiv ID or DOI.

Paper Fetcher

Purpose

Find the corresponding research paper from a screenshot, title, URL, or excerpt; download the verified official PDF into a user-provided research folder; and return the identifier the user can paste into Zotero's Add Item by Identifier dialog.

Required Target Folder

Ask for or infer a target folder before post-processing. Pass it explicitly with --target-dir.

Do not hardcode a personal vault path in public usage examples.

Source Priority

Prefer official sources in this order:

arXiv
OpenReview
ACL Anthology, NeurIPS, ICML, ICLR, ACM, IEEE, Springer, Elsevier, or other official publisher pages
Official project page
GitHub README that links to the paper

Avoid non-official mirrors unless no official PDF is available. Do not bypass paywalls or access controls.

Workflow

Extract paper title, authors, visible IDs, source URL, project URL, and candidate PDF URL from the user input.
Search online when needed and verify the strongest official match.
Identify the best Zotero identifier:
- Prefer arXiv ID.
- Use DOI if no arXiv ID exists.
- If neither exists, report not available and include another source ID such as OpenReview ID for reference.
Download the official PDF into the target folder or a temporary download path.
Read enough of the paper to classify it into exactly one filename prefix:
- RAG
- Agent
- SFT
- RL
- DL_Frameworks
- Other
Run scripts/paper_postprocess.py to verify and rename the PDF as {field}_{original paper title}.pdf.
Report Zotero identifier status in the final response.
Do not generate .bib files unless explicitly requested.
Do not create hand-written Zotero metadata entries through the Web API.
Never edit Zotero's local database files directly.

Field Prefix Selection

Choose the filename prefix by reading the title, abstract, introduction, method, and conclusion. Use the closest field when a paper spans multiple areas:

RAG: retrieval-augmented generation, vector retrieval, dense/sparse retrieval, indexing, query rewriting, reranking, knowledge-grounded generation, long-context retrieval.
Agent: LLM agents, tool use, planning, agent benchmarks, multi-agent systems, browser/computer-use agents, workflow automation, agent memory.
SFT: supervised fine-tuning, instruction tuning, alignment datasets, preference data preparation before RL, domain/task fine-tuning.
RL: reinforcement learning, RLHF, RLAIF, PPO, DPO-style preference optimization, reward models, policy optimization, decision-making agents when RL is central.
DL_Frameworks: training/inference systems, distributed training, compilers, CUDA kernels, PyTorch/TensorFlow/JAX/runtime work, model serving infrastructure.
Other: use only when none of the above fields is a reasonable fit.

Local Helper

Use scripts/paper_postprocess.py after downloading a PDF to sanitize the filename, verify %PDF-, move it into the target folder, and report Zotero identifier status.

Example:

python scripts\paper_postprocess.py `
  --pdf "<downloaded-pdf>" `
  --target-dir "<research-folder>" `
  --title "Agent Harness Engineering: A Survey" `
  --field Agent `
  --authors "Junjie Li and Xi Xiao and Yunbei Zhang" `
  --source-url "https://openreview.net/pdf?id=eONq7FdiHa" `
  --zotero

Optional legacy BibTeX sidecar generation exists only when explicitly requested:

python scripts\paper_postprocess.py `
  --pdf "<downloaded-pdf>" `
  --target-dir "<research-folder>" `
  --title "Proximal Policy Optimization Algorithms" `
  --field RL `
  --authors "John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov" `
  --year 2017 `
  --arxiv-id "1707.06347" `
  --source-url "https://arxiv.org/pdf/1707.06347" `
  --zotero `
  --bib

Zotero Handling

Use Zotero's Add Item by Identifier behavior as the primary import path. The user can paste the returned arXiv ID or DOI into Zotero so Zotero fetches canonical metadata from supported resolvers.

Do not store Zotero API credentials in this skill. Use Zotero Web API or MCP only for lookup, duplicate checks, updates after a canonical item exists, or attachment management when explicitly needed.

Final Response

Always include:

Paper title
arXiv ID, or arXiv: not found
DOI, or DOI: not found
Zotero Add Item by Identifier value, or not available
Other source ID when relevant, such as OpenReview ID
PDF source URL
Saved local path
File size
Zotero status

Failure Handling

If no reliable paper match is found, ask for a clearer screenshot/title/URL instead of guessing.
If the PDF cannot be verified, delete or ignore the invalid file and report the failed source.
If multiple candidate papers match, list candidates and choose the one with the strongest title/source match.