name: research-lit description: Search and analyze research papers, find related work, summarize key ideas. Use when user says "find papers", "related work", "literature review", "what does this paper say", or needs to understand academic papers. argument-hint: [paper-topic-or-url] allowed-tools: Bash(*), Read, Glob, Grep, WebSearch, WebFetch, Write, arxivterminal, arxiv-doc-builder
Research Literature Review
Research topic: $ARGUMENTS
Constants
- PAPER_LIBRARY =
paper_library/— Local directory in the workspace root containing user's paper collection (PDFs). Directory structure:paper_library/{subject}/{date}/, where{subject}is a snake_case topic slug (e.g.diffusion_models,world_models) and{date}isYYYY-MM-DD(the date the paper was saved). - MAX_LOCAL_PAPERS = 20 — Maximum number of local PDFs to scan (read first 3 pages each). If more are found, prioritize by filename relevance to the topic.
- ARXIV_DOWNLOAD = false — When
true, download top 3-5 most relevant arXiv PDFs to PAPER_LIBRARY after search. Whenfalse(default), only fetch metadata (title, abstract, authors) via arXiv API — no files are downloaded. - ARXIV_MAX_DOWNLOAD = 5 — Maximum number of PDFs to download when
ARXIV_DOWNLOAD = true.
💡 Overrides:
/research-lit "topic" — sources: web— only search the web (skip all local)/research-lit "topic" — arxiv download: true— download top relevant arXiv PDFs/research-lit "topic" — arxiv download: true, max download: 10— download up to 10 PDFs
Data Sources
This skill checks multiple sources in priority order. All are optional — if a source is not configured or not requested, skip it silently.
Source Selection
Parse $ARGUMENTS for a — sources: directive:
- If
— sources:is specified: Only search the listed sources (comma-separated). Valid values:local,web,all. - If not specified: Default to
all— search every available source in priority order.
Examples:
/research-lit "diffusion models" → all (default)
/research-lit "diffusion models" — sources: all → all
/research-lit "diffusion models" — sources: local → local PDFs only
/research-lit "diffusion models" — sources: web → web search only
Source Table
| Priority | Source | ID | How to detect | What it provides |
|---|---|---|---|---|
| 1 | Local PDFs | local |
Glob: paper_library/{subject}/{date}/*.pdf |
Raw PDF content (first 3 pages) |
| 2 | Web search | web |
Always available (WebSearch) | arXiv, Semantic Scholar, Google Scholar |
Graceful degradation: If a source is unavailable, skip it silently and continue with the remaining sources.
Workflow
Step 0: Scan Local Paper Library
Before searching online, check if the user already has relevant papers locally:
Locate library: Scan PAPER_LIBRARY for PDF files
Glob: paper_library/**/*.pdfPapers are organized as
paper_library/{subject}/{date}/filename.pdf. When searching for a specific topic, prioritize matching{subject}directories first, then scan broadly.Filter by relevance: Match filenames and first-page content against the research topic. Skip clearly unrelated papers.
Summarize relevant papers: For each relevant local PDF (up to MAX_LOCAL_PAPERS):
- Read first 3 pages (title, abstract, intro)
- Extract: title, authors, year, core contribution, relevance to topic
- Flag papers that are directly related vs tangentially related
Build local knowledge base: Compile summaries into a "papers you already have" section. This becomes the starting point — external search fills the gaps.
📚 If no local papers are found, skip to Step 1. If the user has a comprehensive local collection, the external search can be more targeted (focus on what's missing).
Step 1: Search (external)
- Use WebSearch to find recent papers on the topic
- Check arXiv, Semantic Scholar, Google Scholar
- Focus on papers from last 2 years unless studying foundational work
- De-duplicate: Skip papers already found in local library
arXiv API search (always runs, no download by default):
Locate the fetch script and search arXiv directly:
# Try to find arxiv_fetch.py in the project
SCRIPT=$(find tools/ -name "arxiv_fetch.py" 2>/dev/null | head -1)
# Search arXiv API for structured results (title, abstract, authors, categories)
python3 "$SCRIPT" search "QUERY" --max 10
If arxiv_fetch.py is not found, fall back to WebSearch for arXiv.
The arXiv API returns structured metadata (title, abstract, full author list, categories, dates) — richer than WebSearch snippets. Merge these results with WebSearch findings and de-duplicate.
Optional PDF download (only when ARXIV_DOWNLOAD = true):
After all sources are searched and papers are ranked by relevance:
# Download top N most relevant arXiv papers
python3 "$SCRIPT" download ARXIV_ID --dir paper_library/{subject}/$(date +%Y-%m-%d)/
- Only download papers ranked in the top ARXIV_MAX_DOWNLOAD by relevance
- Skip papers already in the local library
- 1-second delay between downloads (rate limiting)
- Verify each PDF > 10 KB
Step 2: Analyze Each Paper
For each relevant paper (from all sources), extract:
- Problem: What gap does it address?
- Method: Core technical contribution (1-2 sentences)
- Results: Key numbers/claims
- Relevance: How does it relate to our work?
- Source: Where we found it (local/web) — helps user know what they already have vs what's new
Step 3: Synthesize
- Group papers by approach/theme
- Identify consensus vs disagreements in the field
- Find gaps that our work could fill
- Note connections between papers that the user already has locally vs newly discovered ones
Step 4: Output
Present as a structured literature table:
| Paper | Venue | Method | Key Result | Relevance to Us | Source |
|-------|-------|--------|------------|-----------------|--------|
Plus a narrative summary of the landscape (3-5 paragraphs).
Step 5: Save (if requested)
- Save paper PDFs to
paper_library/{subject}/{YYYY-MM-DD}/ - Update related work notes in project memory
Key Rules
- Always include paper citations (authors, year, venue)
- Distinguish between peer-reviewed and preprints
- Be honest about limitations of each paper
- Note if a paper directly competes with or supports our approach
- Never fail because a tool is unavailable — always fall back gracefully to the next data source