name: paper-fetcher-vp description: Use when the user provides a research paper screenshot, title, arXiv/OpenReview/publisher/project URL, or article excerpt and wants Codex to identify the paper, download and verify an official PDF, rename it with a research-field prefix, and report a Zotero Add Item by Identifier value such as arXiv ID or DOI.
Paper Fetcher
Purpose
Find the corresponding research paper from a screenshot, title, URL, or excerpt; download the verified official PDF into a user-provided research folder; and return the identifier the user can paste into Zotero's Add Item by Identifier dialog.
Required Target Folder
Ask for or infer a target folder before post-processing. Pass it explicitly with --target-dir.
Do not hardcode a personal vault path in public usage examples.
Source Priority
Prefer official sources in this order:
- arXiv
- OpenReview
- ACL Anthology, NeurIPS, ICML, ICLR, ACM, IEEE, Springer, Elsevier, or other official publisher pages
- Official project page
- GitHub README that links to the paper
Avoid non-official mirrors unless no official PDF is available. Do not bypass paywalls or access controls.
Workflow
- Extract paper title, authors, visible IDs, source URL, project URL, and candidate PDF URL from the user input.
- Search online when needed and verify the strongest official match.
- Identify the best Zotero identifier:
- Prefer arXiv ID.
- Use DOI if no arXiv ID exists.
- If neither exists, report
not availableand include another source ID such as OpenReview ID for reference.
- Download the official PDF into the target folder or a temporary download path.
- Read enough of the paper to classify it into exactly one filename prefix:
RAGAgentSFTRLDL_FrameworksOther
- Run
scripts/paper_postprocess.pyto verify and rename the PDF as{field}_{original paper title}.pdf. - Report Zotero identifier status in the final response.
- Do not generate
.bibfiles unless explicitly requested. - Do not create hand-written Zotero metadata entries through the Web API.
- Never edit Zotero's local database files directly.
Field Prefix Selection
Choose the filename prefix by reading the title, abstract, introduction, method, and conclusion. Use the closest field when a paper spans multiple areas:
RAG: retrieval-augmented generation, vector retrieval, dense/sparse retrieval, indexing, query rewriting, reranking, knowledge-grounded generation, long-context retrieval.Agent: LLM agents, tool use, planning, agent benchmarks, multi-agent systems, browser/computer-use agents, workflow automation, agent memory.SFT: supervised fine-tuning, instruction tuning, alignment datasets, preference data preparation before RL, domain/task fine-tuning.RL: reinforcement learning, RLHF, RLAIF, PPO, DPO-style preference optimization, reward models, policy optimization, decision-making agents when RL is central.DL_Frameworks: training/inference systems, distributed training, compilers, CUDA kernels, PyTorch/TensorFlow/JAX/runtime work, model serving infrastructure.Other: use only when none of the above fields is a reasonable fit.
Local Helper
Use scripts/paper_postprocess.py after downloading a PDF to sanitize the filename, verify %PDF-, move it into the target folder, and report Zotero identifier status.
Example:
python scripts\paper_postprocess.py `
--pdf "<downloaded-pdf>" `
--target-dir "<research-folder>" `
--title "Agent Harness Engineering: A Survey" `
--field Agent `
--authors "Junjie Li and Xi Xiao and Yunbei Zhang" `
--source-url "https://openreview.net/pdf?id=eONq7FdiHa" `
--zotero
Optional legacy BibTeX sidecar generation exists only when explicitly requested:
python scripts\paper_postprocess.py `
--pdf "<downloaded-pdf>" `
--target-dir "<research-folder>" `
--title "Proximal Policy Optimization Algorithms" `
--field RL `
--authors "John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov" `
--year 2017 `
--arxiv-id "1707.06347" `
--source-url "https://arxiv.org/pdf/1707.06347" `
--zotero `
--bib
Zotero Handling
Use Zotero's Add Item by Identifier behavior as the primary import path. The user can paste the returned arXiv ID or DOI into Zotero so Zotero fetches canonical metadata from supported resolvers.
Do not store Zotero API credentials in this skill. Use Zotero Web API or MCP only for lookup, duplicate checks, updates after a canonical item exists, or attachment management when explicitly needed.
Final Response
Always include:
- Paper title
- arXiv ID, or
arXiv: not found - DOI, or
DOI: not found - Zotero Add Item by Identifier value, or
not available - Other source ID when relevant, such as OpenReview ID
- PDF source URL
- Saved local path
- File size
- Zotero status
Failure Handling
- If no reliable paper match is found, ask for a clearer screenshot/title/URL instead of guessing.
- If the PDF cannot be verified, delete or ignore the invalid file and report the failed source.
- If multiple candidate papers match, list candidates and choose the one with the strongest title/source match.