search-papers - SKILL.md Agent Skill

name: search-papers description: Search for academic papers using Semantic Scholar, OpenReview, OpenAlex, and CrossRef. Use when you need to find related work, check novelty, gather citations, download/read full papers, or get BibTeX. argument-hint: "[query or topic]"

Academic Paper Search

Search query: $ARGUMENTS

Four APIs, each with a clear role:

API	Role	Auth
Semantic Scholar (S2)	Search, citation graph, sort by impact	`$S2_API_KEY` header (optional but recommended)
OpenAlex	Expanded search, OA PDF links, metadata	`$OPENALEX_API_KEY` query param (free, $1/day budget)
OpenReview	Peer reviews, BibTeX for ML venues	None needed for public data
CrossRef	BibTeX for any paper by DOI	None (use `mailto` for polite pool)

Important: Source .env before API calls:

set -a; source .env; set +a  # loads S2_API_KEY, OPENALEX_API_KEY

For detailed API reference with all endpoints and parameters, see reference.md.

Decision Guide: Which API When?

I want to...	Use
Find papers on a topic	S2 keyword search (`/paper/search`) or OpenAlex (`/works?search=...`)
Find most-cited papers	S2 bulk search (`/paper/search/bulk?sort=citationCount:desc`)
Look up a specific paper	S2 paper lookup (by ArXiv ID, DOI, or S2 ID)
Find exact title match	S2 title match (`/paper/search/match`)
Get citation graph	S2 citations + references endpoints
Resolve multiple papers at once	S2 batch POST (up to 500 IDs)
Read full text of a paper	See "Reading Full Papers" below
Read peer reviews + scores	OpenReview (`/notes?forum=<id>`)
Browse a venue's papers	OpenReview (`/notes?content.venueid=...`)
Get BibTeX	S2 `citationStyles` field (fastest), or CrossRef via `dx.doi.org` (most reliable)

Reading Full Papers

You must read the full text of the 3-5 most important papers. Abstracts alone are not sufficient for understanding methodology, experimental details, or making accurate comparisons.

Always download papers to literature/ and record notes in literature/README.md:

mkdir -p literature

Use this priority chain:

Priority 1: Paper has an arXiv ID → ar5iv HTML (best quality, most reliable)

Most ML/AI papers are on arXiv. Check the externalIds.ArXiv field from S2 results.

Use WebFetch on the ar5iv HTML version — this preserves tables, math, and structure far better than PDF-to-text extraction:

WebFetch(url="https://ar5iv.labs.arxiv.org/html/1706.03762", prompt="Extract the full methodology, experimental setup, and key results")

ar5iv converts arXiv LaTeX sources to HTML. The content is structured, tables are preserved, and WebFetch can summarize exactly what you need in a single call. You can make multiple WebFetch calls with different prompts to extract different sections.

Also download the PDF to literature/ for figures and as a local backup:

curl -s -L -o literature/1706.03762.pdf "https://arxiv.org/pdf/1706.03762"

You can read the PDF with the Read tool (pages parameter, max 20 pages per call) if you need to inspect figures or verify details. Rate limit: Wait 3 seconds between consecutive arXiv downloads.

Fallback — if ar5iv is unavailable for a paper, read the downloaded PDF directly:

Read(file_path="literature/1706.03762.pdf", pages="1-15")

Priority 2: Paper has no arXiv ID → OpenAlex OA links (free)

Query OpenAlex for the paper's open access PDF URL:

# Look up by DOI
curl -s "https://api.openalex.org/works/https://doi.org/10.1234/example?api_key=$OPENALEX_API_KEY&select=id,title,best_oa_location,open_access"

The response includes best_oa_location.pdf_url — a direct link to the PDF hosted by the publisher, PMC, or a repository. Download it:

curl -s -L -o literature/paper_name.pdf "$PDF_URL"

Then read with the Read tool as above.

Priority 3: No OA PDF available → WebFetch the publisher page

Use WebFetch to read the paper directly from the publisher's website:

WebFetch(url="https://doi.org/10.1234/example", prompt="Extract the full methodology, experimental setup, and key results")

Many publisher pages include the full text in HTML even when the PDF is paywalled. If the publisher page doesn't have full text, try searching for the paper title + "pdf" to find alternative sources.

After reading each paper

Update literature/README.md with the paper's filename, title, and key takeaways. This serves as a persistent index so you (and future sessions) can quickly see what's been read and what was learned.

Quick Start: Common Workflows

Literature search (find related work)

# 1. Relevance-ranked search (S2)
curl -s -H "x-api-key: $S2_API_KEY" \
  "https://api.semanticscholar.org/graph/v1/paper/search?query=YOUR+TOPIC&limit=10&fields=title,year,citationCount,abstract,externalIds,citationStyles"

# 2. Most-cited papers on the topic (S2)
curl -s -H "x-api-key: $S2_API_KEY" \
  "https://api.semanticscholar.org/graph/v1/paper/search/bulk?query=YOUR+TOPIC&sort=citationCount:desc&fields=title,year,citationCount&limit=10"

# 3. Search with OpenAlex (broader coverage, includes OA links)
curl -s "https://api.openalex.org/works?search=YOUR+TOPIC&filter=publication_year:>2020&sort=cited_by_count:desc&per-page=10&select=id,title,publication_year,cited_by_count,doi,open_access,best_oa_location&api_key=$OPENALEX_API_KEY"

Get BibTeX for a paper

# Option A: From S2 (include citationStyles field — has bibtex key)
curl -s -H "x-api-key: $S2_API_KEY" \
  "https://api.semanticscholar.org/graph/v1/paper/ArXiv:1706.03762?fields=title,citationStyles"

# Option B: From CrossRef via DOI (works for ALL DOI types)
curl -s -L -H "Accept: application/x-bibtex" "https://dx.doi.org/$DOI"

Deep dive on a paper's review on openreview

# Search OpenReview for the paper
curl -s "https://api2.openreview.net/notes/search?term=PAPER+TITLE&source=forum&limit=1"

# Get reviews (use forum ID from above)
curl -s "https://api2.openreview.net/notes?forum=$FORUM_ID&limit=50"

# Read full text — use arXiv if available (see "Reading Full Papers" above)

Search Tips

Use keyword phrases, not sentences — S2 natural language queries return 0 results
Start broad, then narrow: "regularization" → "L2 regularization neural networks"
Filter by year for recent work: &year=2023- (S2) or &filter=publication_year:>2022 (OpenAlex)
Use minCitationCount to skip low-impact: &minCitationCount=50 (S2) or &filter=cited_by_count:>50 (OpenAlex)
Multiple searches: 3-5 queries with different angles for coverage
Read full text: After identifying key papers, always download and read them — see "Reading Full Papers"

Rate limit

S2 abstracts may contain control chars (ESC/0x1b). Strip with: re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', text)
S2 openAccessPdf is often empty — use arXiv direct download or OpenAlex best_oa_location instead
S2 /paper/search does NOT support sort — use /paper/search/bulk instead
S2 /paper/{id}/citations does NOT support sort — newest first only
CrossRef /transform fails for arXiv DOIs — always use dx.doi.org
CrossRef references unreliable — use S2 for reference lists
OpenReview V2 wraps values: content.title.value, not content.title
OpenReview venue IDs have no trailing slash: ICLR.cc/2024/Conference
arXiv rate limit — wait 3 seconds between consecutive downloads
OpenAlex budget — $1/day free tier; list queries cost $0.0001, search costs $0.001; singleton lookups are free